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The Simplest Case: One-Way Treatment Structure 
in a Completely Randomized Design Structure with 
Homogeneous Errors 


Suppose an experimenter wants to compare the effects of several different treatments, 
such as the effects of different drugs on people's heart rates or the yields of several differ¬ 
ent varieties of wheat. Often the first step in analyzing the data from such experiments 
is to use a statistical method, known as a one-way analysis of variance model, to describe 
the data. The model on which the one-way analysis of variance is based is one of the most 
useful models in the field of statistics. Many experimental situations are simply special 
cases of this model. Other models that appear to be much more complicated can often be 
considered as one-way models. This chapter is divided into several sections. In the first 
two sections, the one-way model is defined and the estimation of its parameters is 
discussed. In Sections 1.3 and 1.5, inference procedures for specified linear combinations 
of the treatment effects are provided. In Sections 1.7 and 1.9, we introduce two basic meth¬ 
ods for developing test statistics. These two methods are used extensively throughout 
the remainder of the book. Finally, in Section 1.11, we discuss readily available computer 
analyses that use the above techniques. An example is used to demonstrate the concepts 
and computations described in each section. 


1.1 Model Definitions and Assumptions 

Assume that a sample of N experimental units is selected completely at random from a 
population of possible experimental units. An experimental unit is defined as the basic 
unit to which a treatment will be applied and independently observed. A more complete 
description of experimental units can be found in Chapters 4 and 5. 

In order to compare the effects of t different treatments, the sample of N experimental 
units is randomly divided into t groups so that there are n, experimental units in the zth 
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group, where z = 1,2,... ,f, and N = , n r Grouping the experimental units at random into 

t groups should remove any systematic biases. That is, randomness should ensure that 
the t groups of experimental units are similar in nature before the treatments are applied. 
Finally, one of the t treatments should be randomly assigned to each group of experi¬ 
mental units. Equivalently, the experimental units could be randomly assigned to the 
t treatment groups using some randomization device such as placing n t tags in a bowl 
with treatment 1, n 2 tags in a bowl with treatment 2,...,n t tags in a bowl with treatment t, 
mixing the tags and then randomly selecting tags from the bowl to determine the 
treatment assigned to each experimental unit. This process of using tags in a bowl 
can obviously be carried out using software that has random number generation 
possibilities. 

Let yij denote a response from the/th experimental unit assigned to the zth treatment. The 
values y u , y 12 ,..., y hh can be thought of as being a random sample of size ;z, from a popula¬ 
tion with mean y, and variance a\, the values y 21 , y 22 ,... ,y 2 „ 2 can be thought of as being a 
random sample of size n 2 from a population with mean p 2 and variance ay, and similarly 
for i = 3,4,..., t. The parameters p, and a\ represent the population mean and population 
variance if one applied treatment z to the whole population of experimental units. 

The simplest case is considered in this chapter in that the variances are assumed to be 
homogeneous or equal across treatments or a\ - = ■ • ■ - ay. That is, it is assumed that 

the application of the zth treatment to the experimental units may affect the mean of the 
responses but not the variance of the responses. The equal variance assumption is 
discussed in Chapter 2 as well as the analysis of variance with unequal variances. 

The basic objectives of a good statistical analysis are to estimate the parameters of the 
model and to make inferences about them. The methods of inference usually include 
testing hypotheses and constructing confidence intervals. 

There are several ways to write a model for data from situations like the one described 
above. The first model to be used is called the y, model or the means model. The means 
model is: 


Vi] Hi + £ ij i 1,2,...,/, / 1,2,...,n, 

where it is assumed that 

£ii ~ i-i.d. N( 0, o’-) z = 1,2. t, j= 1,2. n, (1.1) 

The notation fy ~ i.i.d. N( 0, a 2 ) is used extensively throughout this book. It means that the 
£jj (i = 1,2,...,f; /' = 1,2,... ,n,) are independently and identically distributed and that the 
sampling distribution of each is the normal distribution with mean equal to zero and 
variance equal to a 2 . 


1.2 Parameter Estimation 

The most important aspect of a statistical analysis is to get a good estimate of the error 
variance per experimental unit, namely a 2 . The error variance measures the accuracy of an 
experiment—the smaller the cr 2 , the more accurate the experiment. One cannot make any 
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statistically valid inferences in any experiment or study without some knowledge of the 
experimental error variance. 

In the above situation, the z'th sample, i = 1,2,..., t, provides an estimate of cr 2 when n, > 1. 
The estimate of a 2 obtained from the data from the z'th treatment is 

^.2 _y (y.j-y.-f 
G ‘ VI n- 1 

which is an unbiased estimate of a 2 where 


H _ 


The estimate of cr 2 from the z'th treatment is of, which is based on /z, - 1 degrees of freedom, 
and the sampling distribution of (n,— l)a^/a 2 is a chi-square distribution with n, - 1 degrees 
of freedom. 

A weighted average of these t independent estimates of cr 2 provides the best estimate for 
<7 2 possible for this situation, where each estimate of the variance is weighted by its corre¬ 
sponding degrees of freedom. The best estimate of a 2 is 

i =1 / 2=1 


For computational purposes, each variance times its weight can be expressed as 

(n,-i)&* =X(y #-Vif = n i yl = ivfj-(.y i f/n i =ss i 


where y ; . = Z"', y„. Then the pooled estimate of the variance is 


<7 2 = 


y ss 

SSj + SS 2 + ■■■ + SS, 1 

(«j—1) + (zz 2 -l) + —i-(«,-!) N-t 


The pooled estimate of the variance a 2 is based on N-t degrees of freedom and the 
sampling distribution of ( N-t)a 2 /a 2 is a chi-square distribution with N-t degrees of 
freedom; that is, (N-t)a 2 /a 2 ~ 

The best estimate of each p, is fi, = y,„ i = 1,2,..., t. 

Under the assumption given in Equation 1.1, the sampling distribution of p, is normal 
with mean p, and variance c i 2 /n i . That is. 


p,~N 


( 

th — 

n ij 


V 


i = l,2,...,f 


( 1 . 2 ) 
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Using the sampling distributions of p l and d; 2 then 




f N _ ( i = l,2,...,t 


(1.3) 


That is, the sampling distribution of f, is the f-distribution with N-t degrees of freedom. 
In addition, p v jf ,..., jf and of are statistically independent. 


1.3 Inferences on Linear Combinations—Tests and Confidence Intervals 

This section provides tests of hypotheses and confidence intervals for linear functions of 
the parameters in the means model. The results in the previous section can be used to test 
hypotheses about the individual p,. Those results can also be used to test hypotheses about 
linear combinations of the p, or to construct confidence intervals for linear combinations 
of the Pj. 

For an experiment involving several treatments, the investigator selects the treatments 
to be in the study because there are interesting hypotheses that need to be studied. These 
interesting hypotheses form the objectives of the study. The hypotheses involving the 
treatment means most likely will involve specific linear combinations of the means. These 
linear combinations will enable the investigator to compare the effects of the different 
treatments or, equivalently, the means of the different treatments or populations. The 
hypotheses about the means the experimenter has selected can be of the following types 
of hypotheses: 


H ov = a vs H al : (not H 01 :) 

1=1 


for some set of known constants c v c 2 , ■ ■ ., c t and a, 

Hgi'- Pi = P 2 = •• • = /i t vs H a2 : (not H 02 :) 


and 


H 03 : Pi = p, for some i ± i' vs H a3 : (not H 03 :) 

For a linear combination such as that given in H 01 , one can show that 


l c A-lciPi 




u (N-t) 


(1.4) 


This result can be used to make inferences about linear combinations of the form S) =1 c,qi,. 
Since the hypothesis in H 03 can be written as tt 03 : p, - p r = 0, it is a special case of H 01 with 
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Cj = 1, c v = -1, and c k - 0 if k i or i'. A test for H 02 is given in Section 1.5. The estimated stan¬ 
dard error of Z.LiC, A is given by 


s.e.(£c,A) = 



To test H 01 : Z- , c, p,= a vs H al : (not H 01 :) compute the f-statistic 


(1.5) 


t _ 

s-e.(Z c A) 


( 1 . 6 ) 


If | t c | > t a/2v , where v = N -t, then H 01 is rejected at the a = 100% significance level, where 
t u/2v is the upper a/2 critical point of a f-distribution with v degrees of freedom. A (1 - a) 
100% confidence interval for Z| a , c,p, is provided by 

X C <A ± f «/2,v s-e-i^c.p,) (1.7) 


1.4 Example—Tasks and Pulse Rate 

The data in Table 1.1 came from an experiment that was conducted to determine how 
six different kinds of work tasks affect a worker's pulse rate. In this experiment, 78 male 
workers were assigned at random to six different groups so that there were 13 workers in 
each group. Each group of workers was trained to perform their assigned task. On a 
selected day after training, the pulse rates of the workers were measured after they had 
performed their assigned tasks for 1 h. Unfortunately, some individuals withdrew from 
the experiment during the training process so that some groups contained fewer than 
13 individuals. The recorded data represent the number of heart pulsations in 20 s where 
there are N = 68 observations and the total is y = 2197. 

For the tasks data, the best estimate of a 2 is 

<r =£SS, 

i =1 

which is based on 62 degrees of freedom. The best estimates of the p, are /}, = 31.923, 
/A = 31.083, /A = 35.800, /A = 38.000, /% = 29.500, and A, = 28.818. 

For illustration purposes, suppose the researcher is interested in answering the follow¬ 
ing questions about linear combinations of the task means: 

a) Test H 0 : p 3 = 30 vs H a : p 3 ^ 30. 

b) Find a 95% confidence interval for p v 


/(N-t) = 1,916.0761/62 = 30.9045 
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TABLE 1.1 


Pulsation Data and Summary Information for Six Tasks 





Task 



i 

2 

3 

4 

5 

6 


27 

29 

34 

34 

28 

28 


31 

28 

36 

34 

28 

26 


26 

37 

34 

43 

26 

29 


32 

24 

41 

44 

35 

25 


39 

35 

30 

40 

31 

35 


37 

40 

44 

47 

30 

34 


38 

40 

44 

34 

34 

37 


39 

31 

32 

31 

34 

28 


30 

30 

32 

45 

26 

21 


28 

25 

31 

28 

20 

28 


27 

29 



41 

26 


27 

25 



21 



34 






.Vi. 

415 

373 

358 

380 

354 

317 

n i 

13 

12 

10 

10 

12 

11 

yu 

31.9231 

31.0833 

35.8000 

38.0000 

29.5000 

28.8182 

SSi 

294.9231 

352.9167 

253.6000 

392.0000 

397.0000 

225.6364 


c) Test H 0 : p 4 = p 5 vs H a : p 4 ^ p 5 . 

d) Test H 0 : p x = {p 2 + p 3 + p 4 )/3 vs H a : p x A (p 2 + p 3 +p 4 )/3. 

e) Obtain a 90% confidence interval for Ap 4 - p 3 - p 4 - p 5 - p 6 . 

These questions can be answered by applying the results of this section. 

Part a result: A f-statistic for testing H 0 : p 3 = 30 is obtained by substituting into 
Equation 1.6 to obtain 


f _ /v-30 _ _ J^S-SOO- _ q qn 

c “ Q.(fi 3 ) ~ ^a 2 /n 3 “ ^30.9045/10 “ 

The significance probability of this calculated value of f is a= Pr{ | t c \ > 3.30} = 0.0016 
where Pr{ 1 1, : \ > 3.30} is the area to the right of 3.30 plus the area to the left of -3.30 in a 
f-distribution with 62 degrees of freedom. The above value of a was obtained from com¬ 
puter output, but it can also be obtained from some special hand-held calculators. Readers 
of this book who lack access to a computer or a calculator should compare t c = 3.30 to t a/2j62 for 
their choice of a. 

Part b result: A 95% confidence interval for /./, is given by 

Mi ± f o.o25,62 s2(Mi ) = 31.923 ± 2.00^30.9045/13 
= 31.923 ± 2.00 x 1.542 
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Thus the 95% confidence interval about q, is 28.839 < < 35.007 and we are 95% confi¬ 

dent that this interval contains the true, but unknown value of p v 

Part c result: To test H 0 : p i = p 5 , let \ x = p 4 - p 5 , then t 1 = p 4 - = 38.0 - 29.5 = 8.5 and 


s-etfi) 



30.9045 



2.380 


since c 3 = c 2 = c 3 = c 6 = 0, c 4 = 1, and c 5 = -1. 
The t-statistic for testing H 0 : p 4 = p 5 is 


, 8.5 

c 2.380 


3.57 


The significance probability for this test is a= 0.0007. 

Part d result: A test of H 0 : p x = (p 2 + p 3 + p 4 )/3 is equivalent to testing H 0 : p t - |q 2 - 
1^3 -= 0 or testing H 0 : 3p t - p 2 - p 3 - p t = 0. By choosing the last version, the computa¬ 
tions are somewhat easier and the value of the t c test statistic is invariant with respect to a 
constant multiplier. 

Let l 2 = 3q, - p 2 -p 3 - ,u„ then 


t 2 = 3fk-ik~ih~ tk = 3(31.923) - 31.083 - 35.8 - 38.0 = -9.114 
The estimate of the standard error of l 2 is 


S-e.(/ 2 ) 


f 9 

1 

1 

1 3 

30.9045 — 

H- 

H- 

H- 

v 13 

12 

10 

10 J 


= 5.491 


A f-statistic for testing H 0 : 3/./ - p 2 - p 3 - p 4 = 0 is 


, -9.114 

c 5.491 


- 1.66 


The significance probability corresponding to t c is a= 0.1020. 

Part e result: Let Z 3 = 4q, - p 3 - q, - q 5 - p b . Then / 3 = -4.426 and s.e.(t 3 ) = 7.0429. A 90% 
confidence interval for l 3 is 


? 3 ± fo.o 5,62 s.e. (Z 3 ) = -4.426 ± 1.671 x 7.043 = -4.426 ± 11.769 


Thus, a 90% confidence interval is -16.195 < 4q - p 3 - p A - p 5 - p 6 < 7.343. 


1.5 Simultaneous Tests on Several Linear Combinations 

For many situations the researcher wants to test a simultaneous hypothesis about 
several linear combinations of the treatment's effects or means. For example, the general 
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hypothesis involving k linearly independent linear combinations of the treatment means 
can be expressed as 


H n 


C llMl “t ^ 12^2 + ’'' + ^1 

C 21 jU 1 + C 2 2 H 2 ^2t/^t = ^2 


vs H a : (not H 0 ) 


C klfh + C k2^2 


( 1 . 8 ) 


The results presented in this section are illustrated using vectors and matrices. However, 
knowledge of vectors and matrices is not really necessary for readers having access a com¬ 
puter with matrix manipulation software, since most computers allow even novice users 
to easily carry out matrix computations. 

The hypothesis in Equation 1.8 can be written in matrix notation as 


where 


H 0 : Cp.= a vs H a : 


’Cn 

C 12 

C lt 


Vl" 


V 

C 2 1 

c 22 

c 2t 

/ /* = 

q. 

, and a = 

tt 2 

_ C kl 

C k2 ' 

'• C »_ 




_ a K_ 


(1.9) 


( 1 . 10 ) 


It is assumed that the k rows in C were chosen such that they are linearly independent, 
which means that none of the rows in C can be expressed as a linear combination of the 
remaining rows. If the k rows in C are not linearly independent, a subset of the rows that 
are linearly independent can always be selected so that they contain all the necessary 
information about the required hypothesis. 

For example, suppose you have three treatments and you wish to test 


2 = 0,p l -p 3 = 0 and p 2 - q 3 = 0 


the corresponding C matrix is 


C = 


1 

1 

0 


-1 

0 

1 


0 

-1 

-1 


but the third row of C is the difference between the second row and the first row, hence 
the three rows are not linearly independent. In this case, an equivalent hypothesis can be 
stated as H 0 : p x - p 2 = 0 and p x - p 3 = 0, since if q, - p 2 = 0 and p x - p 3 = 0, then p 2 - p 3 must 
be equal to 0. The following discussion uses the assumption that the rows of C are linearly 
independent. 
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Denote the vector of sample means by p, then the sampling distribution of p in matrix 
notation is 


p ~N t (n, a 2 D) where D 


1/n, 0 

0 l/n 2 


0 

0 


0 0 ••• 1/n, 


This equation is read as follows: The elements of the t x 1 vector p have a joint sampling 
distribution that is the f-variate normal distribution with means given by the vector p and 
with variances and covariances given by the elements in the matrix cj 2 D. The ith diagonal 
element of a 2 D is the variance of p and the (i, j)th i ^ j off-diagonal element gives the 
covariance between /./, and p. 

The sampling distribution of C/iis 

Cp ~ N k (Cp, <r 2 CDC') 

The sum of squares due to deviations from H 0 or the sum of squares for testing H 0 : Cp = a 
is given by 

SS H0 = (Cp- aYiCDCV (Cp- a) (1T1) 

and is based on k degrees of freedom, the number of linearly independent rows of C. Using 
the assumption of normality, the sampling distribution of SS H0 /a 2 is that of a noncentral chi- 
square with k degrees of freedom. If H 0 is true, then SS H0 /a 2 ~ The statistic for testing H 0 is 


F = 


SS m /k 

h_2 


The hypothesis H 0 : Cp= a is rejected at the significance level of a if F c > F akN _ t where F akN _ t 
is the upper a critical point of the F-distribution with k numerator degrees of freedom and 
N-t denominator degrees of freedom. The result given here is a special case of Theorem 
6.3.1 in Graybill (1976). 

When H 0 is true, then SS H0 /k is an unbiased estimate of a 2 , which is then compared with 
a 2 , which in turn is an unbiased estimate of a 1 regardless of whether H 0 is true or not. 
Thus the F-statistic given above should be close to 1 if H 0 is true. If H 0 is false, the statistic 
SS H0 /k is an unbiased estimate of 

a 2 + ~ (Cp - ay(CDC')- ] (Cp - a) 

Thus, if H 0 is false, the value of the F-statistic should be larger than 1. The hypothesis H 0 is 
rejected if the calculated F-statistic is significantly larger than 1. 


1.6 Example—Tasks and Pulse Rate (Continued) 

The following is a summary of the information from the example in Section 1.4 with the 
sample size and mean for each of the six tasks. 
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Task i 

1 

2 

3 

4 

5 

6 

H i 

13 

12 

10 

10 

12 

11 

Vi. 

31.9231 

31.0833 

35.8000 

38.0000 

29.5000 

28.8182 


The pooled estimate of the variance is a 2 = 30.9045 and it is based on 62 degrees of freedom. 
The D matrix associated with the sampling distribution of vector of estimated means is 


J_ 

13 

0 


D = 


0 

0 


0 


0 


J_ 

12 

0 

0 

0 


0 

0 

1 

10 

0 

0 


0 

0 

0 

J_ 

10 

0 


0 0 0 


0 

0 

0 

0 

J_ 

12 

0 


0 

0 

0 

0 

0 

J_ 

11 


Suppose the researcher is interested in simultaneously testing the following hypothesis 
involving two linear combinations of the task means: 


H 0 : p 4 -p 5 = 4 and 3p : - p 2 - p 3 - p 4 = 0 vs H a : (not H a ) 

The C matrix consists of two rows, one for each of the linear combinations in H 0r and the 
vector a has two elements as 



o 

o 

o 

1 -1 O' 


'4' 

c = 

3 -1 -1 

-1 0 0 

and a = 

0 


Preliminary computations needed to provide the value of SS H0 are: 


Cji — a 


8.5-4' 


' 4.500' 

-9.114-0. 


-9.114. 


CDC' 


Jl_ J_ 

10 + 12 

To 

‘ 0.1833 
- 0.1000 


To 

9 111 

- 1 - 1 - 1 - 

13 12 10 10. 

- 0 . 1000 ' 

0.9756 


(CDCT 1 


5.7776 0.5922 
0.5922 1.0856 
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and 


SS m = (Cq - fl)'(CDC')-' (Cq- fl) = 158.602 


with 2 degrees of freedom. The test statistic is 


158.602/2 

30.9045 


2.566 


The significance probability of this F-statistic is a = Pr{F > 2.566} = 0.0850. 


1.7 Testing the Equality of All Means 

Often the first hypothesis of interest to most researchers is to test that the means are simul¬ 
taneously equal. The hypothesis is H 0 : p l = p 2 = • • • = q, vs H a : (not H 0 ). Two basic procedures 
are examined for testing the equal means hypothesis. For the particular situation dis¬ 
cussed in this chapter, the two procedures give rise to the same statistical test. However, 
for most messy data situations (for treatment structures other than one-way), the two pro¬ 
cedures can give rise to different tests. The first procedure is covered in this section, while 
the second is introduced in Section 1.9. 

The equal means hypothesis, H 0 : q, = q 2 = ••• = q f is equivalent to a hypothesis of the 
form, H 0 : q, - q 2 = 0, q, - q 3 = 0,... ,q, - q, = 0, or any other hypothesis that involves t - 1 
linearly independent linear combinations of the q ( . The C matrix and a vector correspond¬ 
ing to the set of t - 1 pairwise differences are: 


T 

-1 

0 

0 ••• 

o' 


'O' 

1 

0 

-1 

0 ••• 

0 


0 

1 

0 

0 

-1 ••• 

0 

and a = 

0 

1 

0 

0 

0 

0 


0 

1 

0 

0 

0 ••• 

-1 


0 


The C matrix corresponding to following set of t - 1 linearly independent linear combinations 
of the q,; H 0 : ^ - pr, = 0, qj + - 2q 3 = 0, q. x + q 2 + q 3 - 3q 4 = 0,... ,p 1 + q 2 +- (t - 1) q t = 0 is: 


T 

-1 

0 

0 

... o ' 


'O' 

1 

1 

-2 

0 

... 0 


0 

1 

1 

1 

-3 

... 0 

and a = 

0 





0 



1 

1 

1 

1 

... t-1 


0 
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Many other matrices exist, so that Cp = 0 if and only if p x = p 2 = ■ ■ ■ = p t ; however, all such 
matrices produce the same sum of squares for deviations from H 0 and the same degrees of 
freedom, f-1, and hence the same F-statistic. For this special case Equation 1.11 always 
reduces to 


SS 


H0:a/ 1 =^ 2 = - -=iu t 


=i>(y,.-y..) 2 =X 




A 

N 


( 1 . 12 ) 


1.8 Example—Tasks and Pulse Rate (Continued) 

For the task and pulse rate data in Section 1.4, the SS H0:w=/ , 2 = ... =/l( is computed using 
Equations 1.11 and 1.12. 

Using the formula in Equation 1.12, provides 

415 2 373 2 358 2 380 2 354 2 317 2 2197 2 

m 13 12 10 10 12 11 68 

= 694.4386 


with t— 1 = 5 degrees of freedom. The value of the F c statistic is 

F - 694.4386/5 _ 
c 30.9045 


and the significance probability is a = 0.0015. 

Next, using Equation 1.11, the matrix C, vector a, and matrix D are 


and 


'1 

-1 

0 

0 

0 

1 

o 


'O' 

1 

0 

-1 

0 

0 

0 


0 

1 

0 

0 

-1 

0 

0 

, a = 

0 

1 

0 

0 

0 

-1 

0 


0 

1 

0 

0 

0 

0 

-1 


0 


J_ 

13 

0 


D = 


0 

0 

0 


0 


0 

J_ 

12 

0 

0 

0 


0 

0 

1 

10 

0 

0 


0 

0 

0 

J_ 

10 


0 0 0 0 


0 0 
0 0 
0 0 


0 

J_ 

12 

0 


0 

0 

_L 

11 
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Next compute the individual quantities in Equation 1.11 as 


Cp — a 


0.844 

-3.877 

-6.077 

2.423 

3.105 


The inverse of CDC’ is 


(i CDC'Y 1 = 



r 25 

1 

1 

1 

1 ' 


156 

13 

13 

13 

13 


1 

23 

1 

1 

1 


13 

130 

13 

13 

13 

and CDC' = 

1 

13 

1 

13 

23 

130 

1 

13 

1 

13 


1 

1 

1 

25 

1 


13 

13 

13 

156 

13 


1 

1 

1 

1 

24 


L 13 

13 

13 

13 

143 _ 


' 9.882 

-1.765 

-1.765 

-2.118 

-1.941" 

-1.765 

8.529 

-1.471 

-1.765 

-1.618 

-1.765 

-1.471 

8.529 

-1.765 

-1.618 

-2.118 

-1.765 

-1.765 

9.882 

-1.941 

-1.941 

-1.618 

-1.618 

-1.941 

9.221 


Finally, the value of the sum of squares is 

SS m = (C/i - a)' {CDC'Y 1 {Cl i -a) = 694.4386 

which is the same as the sum of squares computed using Equation 1.12. 

Clearly, this formula is not easy to use if one must do the calculations by hand. However, 
in many messy data situations, formulas such as this one are necessary in order to obtain 
the statistic to test meaningful hypotheses. Fortunately, by utilizing computers, C matrices 
can be constructed for a specific hypothesis and then one can allow the computer to do the 
tedious calculations. 


1.9 General Method for Comparing Two Models—The Principle of 
Conditional Error 

A second procedure for computing a test statistic compares the fit of two models. In this 
section, the two models compared are y , J = p, + £,,, which is the general or unreduced model, 
and yjj = p + £,,, which is the model one would have if H 0 : p 1 = p 2 = ■ ■ ■ = p t = p (say) were 
true. The first model is called the full model or the unrestricted model, while the second 
model is called the reduced model or the restricted model. 

The principle known as the principle of conditional error is used to compare two models 
where one model is obtained by placing restrictions upon the parameters of another model. 
The principle is very simple, requiring that one obtain the residual or error sums of squares 
for both the full model and the reduced model. Let ESS F denote the error sum of squares 
after fitting the full model and ESS R denote the error sum of squares after fitting the 
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reduced model. Then the sum of squares due to the restrictions given by the hypothesis or 
deviations from the null hypothesis is SS H0 = ESS R - ESS F . The degrees of freedom for both 
ESS r and ESS F are given by the difference between the total number of observations in the 
data set and the number of (essential) parameters to be estimated (essential parameters 
will be discussed in Chapter 6). Denote the degrees of freedom corresponding to ESS R and 
ESS f by df R and df F , respectively. The number of degrees of freedom corresponding to SS H0 
is df H0 = df R - df F . An F-statistic for testing H 0 is given by 

p _ SSjdf m 

c ~ ESS f /df F 

One rejects H 0 at the significance level if F c > F a dfmifr . 

For the case discussed above, y F] = p, + £, ( is the full model and y,, = p + £,, is the reduced 
model. The error sum of squares for the full model is 

ESS F = SX(y # -yJ 2 = (N-f)ff I 

i =1 H 

with df F = N— t, and the error sum of squares for the reduced model is 

£SS R =Xi>, ; -y..) 2 

i =1 7=1 

with df R = N— 1. Thus the sum of squares due to deviations from H 0 is 

sSho: Fi =m 2 =-=ii, = ess r ~ess f = X n i(y.- _ y-) 

i=i 

with t — 1 degrees of freedom. This is the same sum of squares as was obtained in 
Equation 1.12. 

The sums of squares that are of interest in testing situations are often put in a table called 
an analysis of variance table. Such a table often has a form similar to that in Table 1.2. The 
entries under the column "Source of variation" are grouped into sets. In a given situation 
only one of the labels in each set is used, with the choice being determined entirely by the 
experimenter. 


TABLE 1.2 


Analysis of Variance Table for One-Way Model to Test Equality of the Means 


Source of Variation 

df 

SS 

MS 

F-test 

=5 

ii 

ii 

=5 

a? 

t- 1 

SSho 

SS H0 

SS H0 /1 

Treatments r 


t-1 

<7 2 

between samples J 

Error 1 

within samples J 

N-t 

SS F 

ESSp 



Note: df= degrees of freedom, SS = sum of square, and MS = mean square. These 
standard abbreviations are used throughout the book. 
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The principle of conditional error is also referred to as the model comparison procedure 
and the process is quite flexible. For example, if you are interested in testing a hypothesis 
for the task and pulse rate data, like H 0 : p 1 =p 2 = p 3 vs H a : (not H 0 ), then the model under 
the conditions of H 0 has the form 

i hj = Po + £ ij for i = 1, 2, 3 
y t j = Pi + Eq for i = 4, 5, 6 

that is, the model has equal means for the first three tasks and different means for the last 
three treatments. Such a model can be fit using most software packages where a qualitative 
or class variable is defined to have the value of 0 for tasks 1, 2, and 3 and the value of task 
for tasks 4, 5, and 6. 


1.10 Example—Tasks and Pulse Rate (Continued) 

The principle of conditional error is applied to the task and pulse rate data of Section 1.4 to 
provide a test of the equal means hypothesis, H 0 : p t = p 2 = p 3 = p A = p 5 = p 6 vs H a : (not H 0 ). 
The error sum of squares for the full model is ESS F = 1916.076 with df F = 62. The error sum 
of squares for the reduced model is ESS R = 73,593 - (2197) 2 /68 = 2610.545 with df R = 67. 
Hence SS H0 = 2610.545 - 1916.076 = 694.439 with df H0 = 67 - 62 = 5. The analysis of variance 
table summarizing these computations is displayed in Table 1.3. 


1.11 Computer Analyses 

This chapter concludes with some remarks about utilizing computers and statistical 
computing packages such as SAS®, BMDP®, SYSTAT®, JMP®, and SPSS®. All of the methods 
and formulas provided in the preceding sections can easily be used on most computers. If 
the computer utilizes a programming language such as MATLAB, SAS-IML, or APL, the 
required matrix calculations are simple to do by following the matrix formulas given in 
the preceding sections. SAS, JMP, BMDP, SYSTAT, and SPSS each contain procedures that 
enable users to generate their own linear combinations of treatment means about which to 
test hypotheses. In addition, these packages all provide an analysis of variance table, treat¬ 
ment means, and their standard errors. Table 1.4 contains SAS-GLM code with estimate 
and contrast statements needed to test hypotheses described for the task and pulse data. 
The estimate statement is used to evaluate one linear combination of the means and the 


TABLE 1.3 


Analysis of Variance Table for Test Equality of the Means for the Task and Pulse Rate Data 


Source of Variation 

df 

SS 

MS 

F 

a 

Due to H 0 

5 

694.439 

138.888 

4.49 

0.0015 

Error 

62 

1,916.076 

30.9045 
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TABLE 1.4 

Proc GLM Code to Fit the Task and Pulse Rate Data with Estimate and Contrast 
Statements Needed to Provide the Analysis Described in the Text 

PROC GLM DATA=EX1; CLASS TASK; 

MODEL PULSE2 0=TASK/NOINT SOLUTION E; 

ESTIMATE 'Ho: M4=M5' TASK 0001-10; 

ESTIMATE 'Ho: 3M1=M2+M3+M4' TASK 3 -1 -1 -1 0 0; 

ESTIMATE 'Ho: 3M1=M2+M3+M4_mn' TASK 3 -1 -1 -1 0 0/DIVISOR=3; 

ESTIMATE '4M1—M3—M4—M5—M6_mn' TASK 4 0-1-1 -1 -l/DIVISOR=4 ; 

CONTRAST '4M1-M3-M4-M5-M6_mn' TASK 4 0-1-1 -1 -1 ; 

CONTRAST ' M4=M5 & 3M1=M2+M3+M4' TASK 0 0 0 1 -1 0, TASK 3 -1 -1 -1 0 0; 
CONTRAST 'EQUAL MEANS 1 ' 

TASK 1-10000, TASK 10-1000, TASK 100-100, 

TASK 1000-10, TASK 1 0 0 0 0-1; 


TABLE 1.5 

Proc IML Code to Carry Out the Computations for the Task and Pulse Data in Section 1.6 

proc iml; 

dd={13 12 10 10 12 11}; 
d=diag(dd); 

c={0 001-10, 3-1-1-100}; 

muhat={31.9231 31.0833 35.8000 38.0000 29.5000 28.8182}'; 

S2=30.90445; 
a—{4,0}; 

cmua=C*muhat - a; 

cdc=c *inv(D)* c'; 

cdci=inv(cdc); 

ssho=cmua' *cdci*cmua; 

f=ssho/(2*s2);al=l-probf(f,2,62); 

print dd d cmua cdc cdci ssho f al; 


provided results are the estimate of the contrast, its estimated standard error, and the 
resulting f-statistic with its corresponding significance level. The contrast statement is 
used to evaluate one or more linear combinations of the means and the provided results 
are the sums of squares, degrees of freedom, and the resulting F-statistic. For both the 
estimate and contrast statements in SAS-GLM, the only values of a in the hypotheses are 
zero, that is, one can only test the linear combinations of means that are equal to zero. 

Table 1.5 contains SAS-IML code to provide the computations for the hypotheses being 
tested in Section 1.6. By constructing the code in a matrix language, one can obtain a test 
of any hypothesis of the form Cp = a. 


1.12 Concluding Remarks 

In this chapter, the analysis of the one-way analysis of variance model was described. 
General procedures for making statistical inferences about the effects of different treatments 
were provided and illustrated for the case of homogeneous errors. Two basic procedures 




The Simplest Case: One-Way Treatment Structure 


17 


for obtaining statistical analyses of experimental design models were introduced. These 
procedures are used extensively throughout the remainder of the book for more complex 
models used to describe designed experiments and for messier data situations. A test for 
comparing all treatment effect means simultaneously was also given. Such a test may be 
considered an initial step in a statistical analysis. The procedures that should be used to 
complete the analysis of a data set could depend on whether the hypothesis of equal treat¬ 
ment means is rejected. 


1.13 Exercises 

1.1 A company studied five techniques of assembling a part. Forty workers were 
randomly selected from the worker population and eight were randomly 
assigned to each technique. The worker assembled a part and the measurement 
was the amount of time in seconds required to complete the assembly. Some 
workers did not complete the task. 


Data for Comparing Techniques of Assembling a Part for Exercise 1.1 


Technique 1 

Technique 2 

Technique 3 

Technique 4 

Technique 5 

Worker 

Time 

Worker 

Time 

Worker 

Time 

Worker 

Time 

Worker 

Time 

1 

45.6 

7 

41.0 

12 

51.7 

19 

67.5 

26 

57.1 

2 

41.0 

8 

49.1 

13 

60.1 

20 

57.7 

27 

69.6 

3 

46.4 

9 

49.2 

14 

52.6 

21 

58.2 

28 

62.7 

4 

50.7 

10 

54.8 

15 

58.6 

22 

60.6 



5 

47.9 

11 

45.0 

16 

59.8 

23 

57.3 



6 

44.6 



17 

52.6 

24 

58.3 
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53.8 

25 

54.8 




1) Write down a model appropriate to describe the data. Describe each compo¬ 
nent of the model. 

2) Estimate the parameters of the model in part 1. 

3) Construct a 95% confidence interval about q, - p 2 . 

4) Use a f-statistic to test H 0 : q 4 + p 2 - q 3 - q 4 = 0 vs H a : (not H 0 ). 

5) Use a F-statistic to test H 0 : q, + q 2 - q 3 - q 5 = 0 vs H a : (not H 0 ). 

6 ) Use a f-statistic to test H 0 : (q 4 + p 2 + q 3 )/3 = (q 4 + q 5 )/2 vs H a : (not H 0 ). 

7) Use a F-statistic to test H 0 : q, = q 2 = q 3 vs H a : (not H 0 ). 

8 ) Use a F-statistic to test H 0 : (q 4 + q 2 + q 3 )/3 = (q 4 + q 5 )/2, (q 4 + q 2 + q 6 )/3 = 
(q 3 + q 4 + q 5 )/3, and (q 4 + q 4 + q 5 )/3 - (q 3 + q 6 )/2 vs H a : (not H 0 ). 

1.2 Five rations were evaluated as to their ability to enable calves to grow. Thirty-one 
calves were used in the study. A mistake in the feeding of the rations produced 
unbalanced distributions of the calves to the rations. The data recorded was the 
number of pounds of weight gained over the duration of the study. 

1) Write down a model appropriate to describe the data. Describe each compo¬ 
nent of the model. 
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Gain Data for Comparing Rations of Exercise 1.2 


Ration 1 

Ration 2 

Ration 3 

Ration 4 

Ration 5 

Calf 

Gain 

Calf 

Gain 

Calf 

Gain 

Calf 

Calf 

Calf 

Gain 

1 

825 

10 

874 

19 

861 

21 

829 

23 

837 

2 

801 

11 

854 

20 

856 

22 

814 

24 

851 

3 

790 

12 

883 





25 

824 

4 

809 

13 

839 





26 

781 

5 

830 

14 

836 





27 

810 

6 

825 

15 

839 





28 

847 

7 

839 

16 

840 





29 

826 

8 

835 

17 

834 





30 

832 

9 

872 

18 

894 





31 

830 


2) Estimate the parameters of the model in part 1. 

3) Construct a 95% confidence interval about p t + p 2 - 2p s . 

4) Use a f-statistic to test H 0 : p 1 + p 2 - 2p 3 = 0 vs H a : (not H 0 ). 

5) Use an F-statistic to test H a : 2p 2 - ,u , - ,u 5 = 0 vs H a : (not H 0 ). 

6 ) Use a f-statistic to test H 0 : (p 1 + p 2 + p 3 )/3 = (p 4 + p 5 )/2 vs H a : (not H 0 ). 

7) Use an F-statistic to test H 0 : p ] = p, and p 3 = p 4 vs H a : (not H 0 ). 

8) Use an F-statistic to test H 0 : p 3 + p 2 - 2p 3 = 0, 2p 2 - p 3 - p 5 = 0, (p 1 + p 2 + p 3 )/3 = 
(fh + M 5 )/2 vs H- (not H 0 ). 

1.3 A study was conducted to evaluate the effect of elevation on the lung volume of 
birds raised at specified elevations. Thirty-five environmental chambers which 
could simulate elevations by regulating the air pressure were used. The five 
effective elevations were each randomly assigned to seven chambers and 35 
baby birds were randomly assigned to the chambers, one per chamber. When 
the birds reached adult age, their lung volumes were measured. The data table 
contains the effective elevations and the volumes of the birds. Three birds did 
not survive the study, thus producing missing data. 


Lung Volumes for Birds Raised at Different Simulated Elevations 


Elevation 1000 ft 

Elevation 2000 ft 

Elevation 3000 ft 

Elevation 4000 ft 

Elevation 5000 ft 

Bird 

Volume 

Bird 

Volume 

Bird 

Volume 

Bird 

Volume 

Bird 

Volume 

1 

156 

8 

160 

15 

156 

22 

168 

29 

177 

2 

151 

9 

160 

16 

173 

23 

167 

30 

170 

3 

161 

12 

154 

18 

165 

24 

171 

31 

169 

4 

152 

13 

152 

19 

172 

25 

173 

32 

176 

5 

164 

14 

153 

20 

169 

26 

167 

33 

183 

6 

153 



21 

168 

27 

167 

34 

178 

7 

163 





28 

173 

35 

174 


1) Write down a model appropriate to describe the data. Describe each compo¬ 
nent of the model. 

2) Estimate the parameters of the model in part 1. 
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3) Determine if there is a linear trend in the lung volume as elevation increases 
by testing H 0 : -2p 1 - p 2 - 0q 3 + q 4 + 2p 5 = 0 vs H a : (not H 0 ) (coefficients were 
obtained from a table of orthogonal polynomials for equally spaced values 
(Beyer, 1966, p. 367)). 

4) Determine if there is a quadratic trend in the lung volume as elevation 
increases by testing H 0 : 2q, - p 2 - 2p 3 - ,u, + 2p s = 0 vs H a : (not H 0 ). 

5) Determine if the assumption of a linear/quadratic response to elevation is 

appropriate by simultaneously testing the cubic and quadratic trends to be 
zero by testing + 2p 2 + 0p 3 - 2q, + l,t/ 5 = 0, l,u,-4q, + 6,u 3 -4q, + lp 5 = 0 

vs H a : (not H 0 ). 

6 ) Use a f-statistic to test H 0 : (p t + p 2 + p 3 )/3 = (p 4 + p 5 )/2 vs H a : (not H 0 ). 

7) Use a F-statistic to test H 0 : q, = p , = b ? and q 4 = q 5 vs H a : (not H 0 ). 





One-Way Treatment Structure in a 
Completely Randomized Design Structure 
with Heterogeneous Errors 


In this chapter, the case is considered where the treatments assigned to the experimental 
units may affect the variance of the responses as well as the mean. Start with the one-way 
means model, y, ; = /y + Cy for i - 1,2,..., t, j = 1,2,..., n r In Chapter 1 it was assumed that 
the experimental errors all had the same variance; that is, the treatments were expected to 
possibly change the mean of the population being sampled, but not the variance. In this 
chapter, some methods are described for analyzing data when the treatments affect the 
variances as well as the mean. The types of questions that the experimenter should want to 
answer about the means in this setting are similar to those in Chapter 1. That is, 

1) Are all means equal? 

2) Can pairwise comparisons among the means be made? 

3) Can a test of the hypothesis of the form £- =1 cyy = a be tested and can confidence 
intervals be constructed about Zj eyy? 

In addition, there are also questions about the variances that may be of interest, such as 

1) Are all of the variances equal? 

2) Are there groupings of the treatments where within a group the variances are 
equal and between groups the variances are not equal? 

Before questions about the means of the model can be answered, an appropriate descrip¬ 
tion of the variances of the treatments must be obtained. 

Tests of homogeneity of variances are used to answer questions about the variances 
of the data from the respective treatments. If there are two treatments, the problem of 
comparing means when there are unequal variances is usually known as the Behrens- 
Fisher problem. Also, heterogeneous error variances pose a much more serious problem 
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when ignored than non-normality of the error variances. The procedures in Chapter 1 are 
robust with respect to non-normality, but not quite so robust with respect to heterogeneous 
error variances. In the analyses previously considered, it was assumed that the population 
variances were all equal, which is a reasonable assumption in many cases. One method for 
analyzing data when variances are unequal is simply to ignore the fact that they are unequal 
and calculate the same F-statistics or f-tests that are calculated in the case of equal vari¬ 
ances. Surprisingly perhaps, simulation studies have shown that these usual tests are quite 
good, particularly if the sample sizes are all equal or almost equal. Also, if the larger sam¬ 
ple sizes correspond to the treatments or populations with the larger variances, then the 
tests computed with the equal variance assumption are also quite good. The usual tests are 
so good, in fact, that many statisticians do not even recommend testing for equal variances. 
Others attempt to find a transformation that will stabilize the treatment variances, that is, 
transform the data such that the treatment variances are equal. When the variances are not 
equal, there are techniques to make comparisons about the means in the framework of the 
unequal variance model. 

Procedures for testing the equality of treatment variances are described for the one-way 
model and procedures for analyzing the treatment means when the variances are unequal 
are described in the following sections. These procedures should be used when the usual 
techniques are suspect. The unequal variance model is described next. 


2.1 Model Definitions and Assumptions 

The unequal variance model is 

y { j = p, + Ey for i = 1,2,..., t, j = 1,2,... ,n, and e tj - independent N( 0, <7^) (2.1) 

The notation ^-independent N( 0, c?) means that the errors, e,- ( , are all independent, nor¬ 
mally distributed and the variance of each normal distribution depends on i and may be 
different for each population or treatment. 


2.2 Parameter Estimation 

The best estimates of the parameters in the model are: 


and 


Pi jA;/h yI./ i 1,2 ,...,f 

h 1 


d? = 


£(y*-y *) 2 


h i 


n, -1 


-, i=l,2,...,f 
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The sampling distributions associated with the parameter estimates are 
/), ~ independent N(p ,, of/ n t ), i= 1,2,...,t 

and 

(tI (j ^ 

—-— 2 —-— independent xl-ir i=1,2,..., t 

a i 

These sampling distributions are used as the basis for establishing tests for equality of 
variances and for providing the analysis of the means when the variances are unequal. 


2.3 Tests for Homogeneity of Variances 

In this section, five procedures are described for testing the equal variances hypothesis, 

H 0 : of =a\ = ••• = of vs H a : (not H 0 :) 

Before the analysis of the means is attempted, the equal variance hypothesis should be 
investigated. If there is not enough evidence to conclude the variances are not equal, then 
the equal variance model in Chapter 1 can be used to investigate the means. If there is 
sufficient evidence to believe the variances are unequal, then the procedures described in 
Section 2.5 should be used to provide an analysis of the means in the unequal variance 
framework. The recommendation is to use the unequal variance model when the equal 
variance hypothesis is rejected at a < 0.01. 

2.3.1 Hartley's F -Max Test 

The first test described is known as Hartley's F-max test (1950). This test requires that all 
samples be of the same size, that is, n x = n 2 = •• • = n t . The test is based on the statistic 

maxjofj 
Fmax = min {<7, 2 } 

i 

Percentage points of F max are provided in the Appendix in Table A.l for a = 0.05 and 0.01. 
The null hypothesis, H 0 , is rejected if F max > F maxavk where v=n - 1, the degrees of freedom 
associated with each of the k individual treatment variances. If the n, are not all equal, a 
"liberal" test of H 0 vs H a can be obtained by taking v = maxjn,} - 1. This test is liberal in the 
sense that one is assuming all treatments have the same (maximum) sample size and so you 
are going to reject the null hypothesis more often than specified by the choice of a. When the 
sample sizes are not too unequal, this process provides a reasonable test. It also protects one 
from doing the usual analysis of variance when there is even a remote chance of it being inap¬ 
propriate. An example illustrating the use of this test is found in Section 2.4. 
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2.3.2 Bartlett's Test 

A second test for testing for homogeneity of variances is a test proposed by Bartlett (1937), 
which has the advantage of not requiring the n, to be equal. Bartlett's test statistic is 



v logo (^ 2 ) ~ X v < l°ge (®f ) 

1=1 


( 2 . 2 ) 


where 


and 


v = n { -1 , v = ^v,, 
1=1 



V,(7, 2 /V 


C = 1 + 


1 

3(f—1) 



v 


The hypothesis of equal variances is rejected if It > X 2 a ,t-i • One of the disadvantages of the 
preceding two tests for homogeneity of variance is that they are quite sensitive to depar¬ 
tures from normality as well as to departures from the equal variances assumption. Most 
of the following tests are more robust to departures from normality. 


2.3.3 Levene's Test 

Levene (1960) proposed doing a one-way analysis of variance on the absolute values of 
the residuals from the one-way means or effects model. The absolute values of the residu¬ 
als are given by z i; = |t/, ( - y u \ , z = 1,2,..., t;j = 1,2,...The F-test from the analysis of vari¬ 
ance is providing a test of the equality of the treatment means of the absolute values of 
the residuals. If the means are different, then there is evidence that the residuals for one 
treatment are on the average larger than the residuals for another treatment. The means 
of the absolute values of the residuals can provide a guide as to which variances are 
not equal and a multiple comparison test (see Chapter 3) can be used to make pairwise 
comparisons among these means. One modification of Levene's test is to use the squared 
residuals in the analysis of variance. 


2.3.4 Brown and Forsythe's Test 

Brown and Forsythe (1974) used Levene's process and modified it by doing a one-way 
analysis of variance on the absolute values of the deviations of the observations from 
the median of each treatment. The absolute values of the deviations from the medians are 
given by m, ; - = | y tj - y imed |, ; = 1,2,... ,f; / = 1,2,... ,n,. The F-test from the analysis of vari¬ 
ance provides a test of the equality of the treatment means of the absolute values of the 
deviations. If the means are different, then there is evidence that the deviations for one 
treatment are on the average larger than the deviations for another treatment. The means 
of the absolute values of the deviations from the medians can provide a guide as to which 
variances are not equal as a multiple comparison tests can be used to make pairwise com¬ 
parisons among these means. This use of the deviations from the medians provides more 
powerful tests than Levene's when the data are not symmetrically distributed. 
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2.3.5 O'Brien's Test 

O'Brien (1979) computed scores as 

r if = [(ip + n, - 2 )n l (y ij - y,,) 2 - ivaf(n, - 1)]/[(«, - 1 )(n, - 2)] (2.3) 

where zv is a weight parameter. The procedure is to carry out an analysis of variance on the 
computed score values. When zv = 0.5, the means of the scores are the sample variances, 
df, thus the comparison of the means of the scores is a comparison of the variances of 
the data. 

There are several other procedures that can be used to test the equality of variances or 
the equality of scale parameters using parametric and nonparametric methods (Conover 
et al., 1981; Olejnik and Algina, 1987). McGaughey (2003) proposes a test that uses the con¬ 
cept of data depth and applies the procedure to univariate and multivariate populations. 
Data depth is beyond the scope of this book. 


2.3.6 Some Recommendations 

Conover et al. (1981) and Olejnik and Algina (1987) conducted simulation studies of homo¬ 
geneity of variance tests that included the ones above as well as numerous others. The 
studies indicate that no test is robust and most powerful for all situations. Levene's test 
was one of the better tests studied by Conover et al. O'Brien's test seems to provide an 
appropriate size test without losing much power according to Olejnik and Algina. The 
Brown-Forsythe test seems to be better when distributions have heavy tails. Based on 
their results, we make the following recommendations: 

1) If the distributions have heavy tails, use the Brown-Forsythe test. 

2) If the distributions are somewhat skewed, use the O'Brien test. 

3) If the data are nearly normally distributed, then any of the tests are appropriate, 
including Bartlett's and Hartley's tests. 

Levene's and O'Brien's tests can easily be tailored for use in designed experiments that 
involve more than one factor, including an analysis of covariance (Milliken and Johnson, 
2002). Levene's, O'Brien's and Brown-Forsythe's tests were shown to be nearly as good as 
Bartlett's and Hartley's tests for normally distributed data, and superior to them for non- 
normally distributed data. Conover et al. and Olejnik and Algina discuss some nonpara¬ 
metric tests, but they are more difficult to calculate and the above recommended tests 
perform almost as well. An example follows where each of the tests for equality of vari¬ 
ances is demonstrated. 


2.4 Example—Drugs and Errors 

The data in Table 2.1 are from a paired-association learning task experiment performed on 
subjects under the influence of two possible drugs. Group 1 is a control group (no drug), 
group 2 was given drug 1, group 3 was given drug 2, and group 4 was given both drugs. 
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TABLE 2.1 


Data from Paired-Association Learning Task Experiment 



No Drug 

Drug 1 

Drug 2 

Drugs 1 and 2 


1 

12 

12 

13 


8 

10 

4 

14 


9 

13 

11 

14 


9 

13 

7 

17 


4 

12 

8 

11 


0 

10 

10 

14 


1 

— 

12 

13 


— 

— 

5 

14 

n 

7 

6 

8 

8 

Sum 

32 

70 

69 

110 

Median 

4 

12 

9 

14 

Mean 

4.5714 

11.6667 

8.6250 

13.750 

Variance 

16.2857 

1.8667 

9.6964 

2.786 


The sample sizes, sums, medians, means and variances of each group's data are included 
in Table 2.1. 

The F-max statistic is F max = 16.286/1.867 = 8.723. The liberal 5% critical point is obtained 
from Table A.l with k = t = 4 and v = 7. The critical point is 8.44 and since 8.723 > 8.44, one 
rejects H 0 : a\ = at, = • • • = a 2 t versus H a : (not H 0 :) with significance level 0.05, but cannot 
reject at the a = 0.01 level. 

The computations for Bartlett's test are: 


C = 1 + 


3x3 


1 1 

- + — 

5 7 


1 

7 



and 


Thus 


6(16.2857) + 5(1.8667) + 7(9.6964) + 7(2.7860) 

— 


= 7.7769 


U = 


1 

c 


f 

Vlog c (7 2 

V 



o- 




J 


= ^ [25 log e (7.7769) - 6 log e (16.2857) - 5 log c (1.8667) 

- 7 log c (9.6964) - 7 log e (2.7860)] 

= 7.8111 


The asymptotic sampling distribution associated with U is a that of a chi-square 
distribution based on three degrees of freedom. The significance level of the test is 0.0501 
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and one would again conclude that the variances are unequal at an approximate 5% 
significance level. 

The computations for Levene's test begin with the computation of the residuals or the 
deviations of the observations from the treatment means. Next the absolute values of the 
residuals are computed as illustrated in Table 2.2. Finally, a one-way analysis of variance is 
carried out on these absolute values of the residuals. The value of the resulting F-statistic 
is 6.97, which is based on 3 and 25 degrees of freedom. The observed significance level of 
Levene's test is 0.0015. The squared deviations or squared residuals version of Levene's test 
can be obtained by squaring the items in Table 2.2 before doing the analysis of variance. In 
this case, the value of the F-statistic is 7.36 and the observed significance level is 0.0011 (also 
based on 3 and 25 degrees of freedom). 

The Brown-Forsythe test statistic is obtained by computing the absolute value of 
the deviations of the observations from the treatment median (medians are in Table 2.1). 
Table 2.3 contains the absolute values of the deviations from the medians. Next, the one¬ 
way analysis of variance provides an F-statistic of 5.49 and the observed significance level 
is 0.0049 (also based on 3 and 25 degrees of freedom). 

Table 2.4 contains the values of r tj computed using Equation 2.3 with zv = 0.5. The O'Brien 
test statistic is obtained by carrying out an analysis of variance. The value of the F-statistic 


TABLE 2.2 

Values of = | y t] - y t \ for Computing Levene's Test Where 
y t j Values are from Table 2.1 


No Drug 

Drug 1 

Drug 2 

Drugs 1 and 2 

3.571 

0.333 

3.375 

0.750 

3.429 

1.667 

4.625 

0.250 

4.429 

1.333 

2.375 

0.250 

4.429 

1.333 

1.625 

3.250 

0.571 

0.333 

0.625 

2.750 

4.571 

1.667 

1.375 

0.250 

3.571 

— 

3.375 

0.750 

— 

— 

3.625 

0.250 


TABLE 2.3 

Absolute Values of Deviations of the Observations 
from the Treatment Medians 


No Drug 

Drug 1 

Drug 2 

Drugs 1 and 2 

3 

0 

3 

1 

4 

2 

5 

0 

5 

1 

2 

0 

5 

1 

2 

3 

0 

0 

1 

3 

4 

2 

1 

0 

3 

— 

3 

1 

— 

— 

4 

0 
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TABLE 2.4 


Scores Using w = 0.5 for O'Brien's Test 


Obrl 

Obr2 

Obr3 

Obr4 

14.740 

-0.083 

13.295 

0.464 

13.457 

3.517 

25.676 

-0.155 

23.540 

2.167 

6.176 

-0.155 

23.540 

2.167 

2.461 

12.845 

-1.210 

-0.083 

-0.324 

9.131 

25.190 

3.517 

1.533 

-0.155 

14.740 

— 

13.295 

0.464 

— 

— 

15.461 

-0.155 


is 6.30 and the observed significance level is 0.0025. The value of the F-statistic using 
zv = 0.7 (computations not shown) is 5.90 and the observed significance level is 0.0035. 
There are 3 and 25 degrees of freedom associated with each of O'Brien's tests. 

Each of the test statistics indicates that there is sufficient evidence to conclude that the 
variances are not equal. The group means of the absolute values of the residuals are shown 
in Table 2.5. Pairwise comparisons among these treatment absolute residual means are 
shown in Table 2.6. The means of the absolute values of the residuals for no drug and drug 2 
are not different, for drug 1 and drugs 1 and 2 are not different, but there are differences 
between these two sets. A simple model with two variances could be used to continue the 
analysis of the treatment means. Using a simple variance model will improve the power 
of some of the tests about the means. The two variance model and the corresponding 
comparisons of means will follow the discussion of the analysis using four variances. 


TABLE 2.5 

Means of the Absolute Values of the Residuals 


Group 

Estimate 

Standard Error 

df 

f-Value 

Pr > 1 1 1 

Both drugs 

1.0625 

0.4278 

25 

2.48 

0.0201 

Drug 1 

1.1111 

0.4940 

25 

2.25 

0.0336 

Drug 2 

2.6250 

0.4278 

25 

6.14 

<0.0001 

No drug 

3.5102 

0.4574 

25 

7.67 

<0.0001 


TABLE 2.6 


Pairwise Comparisons between the Group Means of the Absolute Values of the Residuals 


Group 

_Group 

Estimate 

Standard Error 

df 

t- Value 

Pr > |f| 

Both drugs 

Drug 1 

-0.04861 

0.6535 

25 

-0.07 

0.9413 

Both drugs 

Drug 2 

-1.5625 

0.6050 

25 

-2.58 

0.0161 

Both drugs 

No drug 

-2.4477 

0.6263 

25 

-3.91 

0.0006 

Drug 1 

Drug 2 

-1.5139 

0.6535 

25 

-2.32 

0.0290 

Drug 1 

No drug 

-2.3991 

0.6732 

25 

-3.56 

0.0015 

Drug 2 

No drug 

-0.8852 

0.6263 

25 

-1.41 

0.1699 
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2.5 Inferences on Linear Combinations 

The problems of testing hypotheses about and constructing confidence intervals for an 
arbitrary linear combination of the treatment means, X) =1 c t p t , are discussed in this section 
when the variances erf are too unequal to apply the tests and confidence intervals 
discussed in Chapter 1. It is recommended that you use the procedures in this section and 
the next if the equality of variance hypothesis is rejected at the 0.01 or 1% level. If there is 
not sufficient evidence to believe that the variances are unequal, then one can use the 
results in Chapter 1 to make inferences about the treatment means. 

The best estimate of X- =1 qq, is X‘ =1 c,fi ] and the sampling distribution is 




i=i 


f t 


N 


E c '‘ u "E c > 2 °'. 2 /m,- 


V i=i 


\ 


J 


and thus. 


t t 



N(0,1) 


An obvious statistic to use for making inferences about X; =1 c,/i„ when the variances are not 
known and are unequal, is 


f f 

z — _ /=! 

'JP^ 

If the n, corresponding to nonzero c, are all very large, one can reasonably assume that 
Z has an approximate N( 0, 1) distribution, and hence Z can be used to make inferences 
about X- | c,,u,. In this case, an approximate (1 - a)100% confidence interval for X' =1 eye, is 
provided by 

E C ,Mi ± Za/2 
1=1 V 1=1 

where z a/2 is the upper a/2 critical point of the standard normal probability distribution. 

To test H n : X- , c,p, = a vs H,,: Y 1 t i=1 c i p i *a, where a is a specified constant, one could 
calculate 


t 



and if |z| > z a/2 , then reject H 0 at a significance level of a. 
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In other instances, note that z can be written as 


( t t \ 

'L c A-'L c m 

V i =1 i =1 ) 

nl 

facfof/n, 

1=1 


J^cfuf/n, 

V i=i 


The numerator of z has a standard normal distribution and the numerator and denomina¬ 
tor of z are independently distributed. The distribution of z could be approximated by a 
f(v) distribution if v could be determined such that 


V = v x -f - 

Scfcrf/H; 

1=1 

is approximately distributed as X 2 ( v )- I n order to get a good chi-square approximation to 
the distribution of V when the variances are unequal, select a chi-square distribution that 
has the same first two moments as V. That is, to find v for the case of unequal variances, 
find v so that the moments of V are equal to the first two moments of a X 2 (v) distribution 
(this is known as Satterthwaite's method). This results in determining that the approxi¬ 
mate number of degrees of freedom is 


( ‘ \ 2 

X c fa 2 /"* 

v = ^- 1 — 

1 )] 

1=1 

Unfortunately, since v depends on o 2 y a 2 ,, a] it cannot be determined exactly. The usual 
procedure is to estimate v by 


( ‘ \ 2 

v i=i _ y 

^[cfaf/n^n- 1)] 

i =1 


Summarizing, one rejects H 0 : Zj ,c,q, = a vs H a : Z- * a, if 


(2.4) 


iw-' 


t = 


> t 


Scfdf/n,. 


a/2,v 
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where v is determined using Equation 2.4. An approximate (1 - a)100% confidence inter¬ 
val for Xj ,c,q, is given by 


i =1 V i=l 

Unfortunately, every time one wants to test a new hypothesis or construct another 
confidence interval, the degrees of freedom v must be re-estimated. It can be shown 
that n* — 1 < v< t(n* - 1) where n^ = mm{n 1 ,n 2 ,■■■,n t } and n* = max{n 1 ,n 2 ,...,n f }. Thus, 
if | t c | > fa/2,,,,-1/ one can be assured that | t c \ > t a/2W and if | t c \ < t a/2ilin ._ V)l one can be assured 
that | t c | < f a / 2 ,v- In these cases, one can avoid calculating v. When t a/2Mn ._ V) < \ t c \ < t a/2n ,^ 
the value of v must be calculated in order to be sure whether one should reject or fail to 
reject the null hypothesis being tested. For confidence intervals, v should always be 
calculated. Next, the preceding results are demonstrated with the drug errors example. 


2.6 Example—Drugs and Errors (Continued) 

Consider the data in Table 2.1, and suppose the experimenter is interested in answering 
the following questions: 

1) On average, do drugs have any effect on learning at all? 

2) Do subjects make more errors when given both drugs than when given only one? 

3) Do the two drugs differ in their effects on the number of errors made? 

To answer the first question, one might test the hypothesis that the mean of the three 
drug groups is equal to the control mean. That is, one would test 


H 0 r l\ — Pi 


(P2+P 3 +Pi) 


= 0 vs H a {. /, ^ 0 


The estimate of this linear combination is 


l = p, _ik±A±ik = 4,57i _ 1(34.042) = -6.776 
and the estimate of the corresponding standard error of / j is 


s-e.(l 1 ) = JX 


4 


V 


<7 ^ + - 
7 9 


(~ 2 \ 

<J 2 

li 

f 0;) 

li 

(a 2 A 

1 6 J 

+ — 

1 9I 

’ 00 

+ — 

1 9I 

v 00 


V2.535 = 1.592 
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The approximate degrees of freedom associated with this estimated standard error are 
obtained by using 


I 


4 *4 

C: G: 




0.9052 


so that 


. (2.535) 2 

V_ 0.9052 


7.10 


The value of the test statistic is t c = -6.776/1.992 = -4.256 with the observed significance 
level a = 0.0038. 

A 95% confidence interval for /, is 


4 ± t a /2 ,v x s Mk) = -6.776 ± (2.365)(1.592) 


which simplifies to 


-10.54 < p 1 


p 2 + p 3 +p 4 

3 


< -3.01 


Next test to see if the mean of the group given both drugs is equal to the mean of the 
average of the means of the two groups given a single drug. That is, test 


H 02 : / 2 = Ah-^|^ = 0 vs H a2 : l 2 * 0 


The estimate of this linear combination is 


4 — Ah 


p 2 + p 3 


3.6042 


and its estimated standard error is 


s.e.(4) = 



= V0.7290 = 0.8538 


The value of the test statistic is t c = 3.6042/0.8538 = 4.221, which is significant at a = 0.01 
since |f c | > t 00055 . In this case, the value of v need not be computed using n*-l as the 
approximating degrees of freedom. The computed value of v is 16.8, which would be 
needed if one wanted to construct a confidence interval about l 2 . 
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Finally, to test the hypothesis to see if the two drug means differ, test H 0 : l 3 = p 2 - p 3 = 0 vs 
H a : Z 3 = p 2 - p 3 ^ 0. The estimate of this linear combination is l 3 = p 2 -p 3 = 3.042 and its 
estimated standard error is 


s.e. 


(4) = JX 


( Jl £2 \ 

c i a i 


f -2\ 

(J 2 

+ 

f -2\ 

l n > J 

i 

l 6 J 

L 8 J 


= Vl.523 =1.234 


The approximate number of degrees of freedom is computed using 


so that 


4 4 ~ 4 

X , C,gf =0-229 
&nr(n t - 1) 


. = (T523f = 
0.229 


Thus, t c = 3.042/1.234 = 2.465, which has an observed significance level of a. = 0.0334. 


2.7 General Satterthwaite Approximation for Degrees of Freedom 

The Satterthwaite approximation to the number of degrees of freedom associated with 
estimated standard error is obtained from 

v _ 2 *( E {[ s 2 ( Q ] 2 }) 2 

Var{[s. e .(/)] 2 } 


where [£e.(/)] 2 is used to estimate E[s.e.(/)] 2 and the Var[s7>.(/)] 2 is estimated by 2/ , cfaff 
[nf(n t - 1)]. For more complex models, Var[s.e.(/)] 2 can be approximated by using a first- 
order Taylor's series (Kendall and Stuart, 1952) as q'Mq where M is the estimated asymp¬ 
totic covariance matrix of the estimates of the variances and the elements of the vector q are 
the first derivatives of E[s.e.(l)] 2 with respect to the individual variances, that is. 


E[(s.e.(/)] : 

da 


The q t are evaluated at the estimated values of each treatment's variances (Montgomery and 
Runger, 1993,1994). When the data from each of the samples are normally distributed, then 

(n 4 -l)<T? 

is distributed as a central chi-square random variable. Thus E(aJ) = o . and Var(<7,) = 
2af/(Uj - 1). Let the linear combination of interest be / = 2/ , cp,, which has variance 
af = i; =1 cfaf Mj. The partial derivative of ay with respect to a 2 is 
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dof _ c- 

do; n, 

The approximate variance of of obtained using the Taylor's series first-order approxi¬ 
mation is 


Var(of) = 


X 


)± 

2 

\ 2af T 

\_ n i_ 


n i- ij 


2 


The next step is to replace the population variances with their corresponding sample 
estimates providing the approximating degrees of freedom 


,_ 2 *(£{[ s 7.(?)] 2 }) 2 

Var{[s.e.(/)] 2 } 



VJ=i 


\2 

/ 


^ctof/[n;{n-l)] 

i =1 


the same as that provided by the Satterthwaite approximation above. 


2.8 Comparing All Means 

As previously stated, the usual F-test is very robust when the variances are unequal, 
provided that the sample sizes are nearly equal or provided that the larger sample sizes 
correspond to the samples from populations with the larger differences of variances. In 
this section, two additional tests of the hypothesis of equal means are provided. The first 
test of the equal means hypothesis, H 0 : q, = q 2 = • • • = q, vs H a : (not Hy), is given by Welch 
(1951), and is known as Welch's test. Define weights W, = nj of, let y* = S- =1 W^./E- =1 W ; be a 
weighted average of the sample means, and let 


A _y q-Wj/W .) 2 


where W. = £j =1 W,-. Then Welch's test statistic is 


F = 


V w ——— 

tr ' <tv 

1 + 2(f-l)A/(f 2 -1) 


(2.5) 


which has an approximate F-distribution with numerator and denominator degrees of 
freedom, v 1 = t- 1 and v 2 = (t 2 - 1)/3A, respectively. Thus, the null hypothesis H 0 : q, = 
q 2 = • • • = q, is rejected if F c > F a Vi Vi . The numerator of Equation 2.5 can also be computed as 
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TABLE 2.7 


Quantities for Computing Welch's Test 


i 

Drug 1 

Drug 2 

Drug 3 

Drug 4 

n i 

7 

6 

8 

8 

Vi¬ 

4.5714 

11.6667 

8.62500 

13.7500 

al 

16.2857 

1.8667 

9.69643 

2.7857 

w, 

0.4298 

3.2143 

0.82505 

2.8718 


[Z- =1 ( W, y?.) - W.y’ 2 ]/(t - 1). The procedure is demonstrated using the data from Section 2.4 
and the preliminary computations are provided in Table 2.7. 

From the above information compute W.= 7.341, y* = 11.724, 

^ _ (1-0.430/7.341) 2 | (1-3.214/7.341) 2 | (1-0.825/7.341) 2 | (1-2.872/7.341) 2 _ or?6 

“ 6 + 5 + 7 + 7 

and I' =1 W, yl -Wy * 2 = 1050.8069 - 1009.0954 = 43.7114. 

The value of Welch's test statistic is 


_ 41.7114/3 

c “ 1 + 2x2x0.376/15 


13.9038 

1.1003 


12.6355 


with v 1 = 3 and v 2 = 15/(3 x 0.376) = 13.283 degrees of freedom. The observed significance 
probability corresponding to F c is d= 0.00035. For comparison purposes, the usual F-statistic 
is F c = 14.91 with 3 and 25 degrees of freedom. Welch's test can be obtained using SAS®- 
GLM by specifying WELCH as an option on the MEANS statement. Table 2.8 contains the 


TABLE 2.8 

SAS-GLM Code to Provide the Brown-Forsythe's Test of Equality of Variances 
and to Provide the Welch Test of Equal Means with the Unequal Variance Model 

proc glm data=task; 

class group; 

model errors=group; 

means group/HOVTEST=BF WELCH; 

format group druggrps.; 

Welch's Test 

Source df F- Value Pr > F 

Group 3 12.64 0.0003 

Error 13.2830 


Brown and Forsythe's Test for Homogeneity of Errors Variance 
ANOVA of Absolute Deviations from Group Medians 


Source 

df 

Sum of Squares 

Mean Square 

F-Value 

Pr>T 

Group 

3 

31.3762 

10.4587 

5.49 

0.0049 

Error 

25 

47.5893 

1.9036 
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GLM code used to provide BF test for equality of variances and Welch's test for equality of 
means. The important parts of the output are in the second part of Table 2.8. Other tests for 
equality of variances can be obtained by specifying O'Brien, Levene or Bartlett. 

The second procedure for testing the equality of the treatment means is obtained from 
generalizing the process of testing a hypothesis about a set of linear combinations of the p,. 
Suppose a hypothesis is formed involving r independent linear combinations of the p,, 
such as H 0 : £- =1 c ki p t = 0, S- =1 c 2i Pi = 0,..., S, f =1 c n p, = 0 vs H a : (not H 0 ). Let C be a r x t matrix 
where the fcth row contains the coefficients of the /cth linear combination. If one assumes the 
data from each of the populations or treatments are normally distributed, then the joint 
sampling distribution of the vector of treatment means is p~ N[p, V] where V is a diagonal 
matrix whose ith diagonal element is cr/«,. The joint sampling distribution of the set of 
linear combinations Cp is Cp~ N[Cp, CVC'\. The sum of squares due to deviations from 
the null hypothesis is SSH 0 = [C/i|'[CVC'] _ 1 [C/i|, which is asymptotically distributed as a chi- 
square distribution with r degrees of freedom. An approximate small sample size statistic 
is F c = SSH 0 /r with the approximating distribution being F with r and v degrees of freedom 
where v needs to be approximated (Fai and Cornelius, 1996; SAS Institute, Inc., 1999, p. 2118). 
The computation of the approximate degrees of freedom starts with carrying out a spectral 
decomposition on CVC' = QDQ' where D is an r x r diagonal matrix having the characteri¬ 
stic roots of CVC' as diagonal elements and where Q is a rx r orthogonal matrix of the 
corresponding characteristic vectors of CVC'. Let z' k be the /cth row of QC, and let 



where d k is the /cth diagonal element of D, b k contains the partial derivatives of z' k Vz k with 
respect to each of the variance parameters in V evaluated at the estimates of the variances, 
and M is the asymptotic covariance of the vector of variances. Let 

k= l r k -Z 

where I[v k > 2] is an indicator function with the value of 1 when v k >2 and 0 otherwise. The 
approximate denominator degrees of freedom for the distribution of F c are 



[0 if S < r 

The above process can be used to provide a test of the equal means hypothesis by selecting 
a set of t - 1 linearly independent contrasts of the p,. 

The SAS-Mixed procedure implements a version of this approximation to the denomina¬ 
tor degrees of freedom associated with an approximate F statistic with multiple degrees of 
freedom in the numerator. SAS-Mixed can be used to fit models with unequal variances 
per treatment group or unequal variances in some other prespecified pattern using the 
REPEATED statement and specifying the GROUP = option. The Mixed code in Table 2.9 
was used to fit the unequal variance model to the data in Table 2.1. The REPEATED state¬ 
ment is used to specify that a different variance (each value of group) is to be estimated for 
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TABLE 2.9 

SAS-Mixed Code to Fit the Unequal Variance Model to the Data in Table 2.1 

proc mixed cl covtest data=task; 
class group; 

model errors=group/ddfm=kr; 
repeated/group=group; 

estimate "part(l)" group —1 —1 —1 3/divisor=3 cl alpha=0.05; 
estimate "part(2)" group 2 —1 —1 0/divisor=2 cl alpha=0.05; 
estimate "part(3)" group 01—1 0/cl alpha=0.05; 
lsmeans group/diff cl; 


each treatment. The three Estimate statements are used to provide the computations 
corresponding to the three questions in Section 2.6. 

The results from the Mixed procedure are given in Table 2.10, where the Covariance 
Parameter Estimates are the estimates of the four treatment variances, AIC in the Fit Statistics 
is the Akaike Information Criteria (Akaike, 1974), the Null Model Likelihood Ratio Test 
provides a test of the equal variance hypothesis, the type III tests of fixed effects provides 
the test of the equal means hypothesis using the second statistic and the corresponding 


TABLE 2.10 

Results of Fitting the Unequal Variance Model to the Data in Table 2.1 


Covariance Parameter Estimates 







Covariance 

Standard 






Parameter 

Group Estimate 

Error 

Z-Value 

Pr Z 

a 

Lower 

Upper 

Residual 

Both drugs 2.7857 

1.4890 

1.87 

0.0307 

0.05 

1.2178 

11.5394 

Residual 

Drug 1 1.8667 

1.1806 

1.58 

0.0569 

0.05 

0.7273 

11.2286 

Residual 

Drug 2 9.6964 

5.1830 

1.87 

0.0307 

0.05 

4.2388 

40.1658 

Residual 

No drug 16.2857 

9.4026 

1.73 

0.0416 

0.05 

6.7625 

78.9710 

Fit Statistics 







AIC (smaller is better) 129.8 







Null Model Likelihood Ratio Test 







df Chi-Square Pr > Chi-Square 







3 

8.34 0.0394 







Type III Tests of Fixed Effects 







Effect 

Num df Den df F- Value 

Pr > F 






group 

3 11.8 12.53 

0.0006 






Estimates 








Label 

Estimate Standard Error 

df 

f-Value 

Pr > | f | 

a 

Lower 

Upper 

Part 1 

-6.7758 1.5920 

7.1 

-4.26 

0.0036 

m 

-10.5299 

-3.0217 

Part 2 

3.6042 0.8538 

16.8 

4.22 

0.0006 


1.8011 

5.4073 

Part 3 

3.0417 1.2342 

10.1 

2.46 

0.0332 


0.2962 

5.7871 
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TABLE 2.11 


Estimates of the Drug Group Means and Pair Wise Comparisons Using the Unequal 
Variance Model 


Least Squares Means 

Effect Group 

Estimate 

Standard Error 

df 

t- Value 

Pr > 1 1 1 

a 

Lower 

Upper 

Group 

Both drugs 

13.7500 

0.5901 


7 

23.30 

<0.0001 

0.05 

12.3546 

15.1454 

Group 

Drug 1 

11.6667 

0.5578 


5 

20.92 

<0.0001 

0.05 

10.2329 

13.1005 

Group 

Drug 2 

8.6250 

1.1009 


7 


7.83 

0.0001 

0.05 

6.0217 

11.2283 

Group 

No drug 

4.5714 

1.5253 


6 


3.00 

0.0241 

0.05 

0.8392 

8.3037 

Differences of Least Squares Means 












Standard 







Effect 

Group 

_Group 

Estimate 

Error 


df 

t- Value Pr > 1 1 

1 « 

Lower 

Upper 

Group 

Both drugs 

Drug 1 

2.0833 

0.8120 


11.9 

2.57 

0.0249 

0.05 

0.3117 

3.8550 

Group 

Both drugs 

Drug 2 

5.1250 

1.2491 


10.7 

4.10 

0.0018 

0.05 

2.3668 

7.8832 

Group 

Both drugs 

No drug 

9.1786 

1.6355 


7.78 

5.61 

0.0006 

0.05 

5.3886 

12.9685 

Group 

Drug 1 

Drug 2 

3.0417 

1.2342 


10.1 

2.46 

0.0332 

0.05 

0.2962 

5.7871 

Group 

Drug 1 

No drug 

7.0952 

1.6241 


7.55 

4.37 

0.0027 

0.05 

3.3109 

10.8796 

Group 

Drug 2 

No drug 

4.0536 

1.8811 


11.3 

2.15 

0.0536 

0.05 

-0.07507 

8.1822 


approximate degrees of freedom for the denominator, and the Estimates contain the results 
corresponding to the three questions in Section 2.6, where f-statistics, approximate denom¬ 
inator degrees of freedom, and 95% confidence intervals are provided. Table 2.11 contains 
the estimated treatment means with their corresponding estimated standard errors. The 
denominator degrees of freedom are the degrees of freedom corresponding to their 
respective variances. The second part of Table 2.11 contains the pairwise comparisons of the 
treatment means including the approximate denominator degrees of freedom for each com¬ 
parison. This model could be simplified by using one variance for drug 1 and both drugs 
and one variance for drug 2 and no drug. This can be accomplished by defining a variable, 
say T, to be 1 for drug 1 and both drugs and 0 for the other two treatments. Then place T in 
the class statement and use Repeated/Group = T; in the model specification. The estimates 
of the two variances are 2.4028 and 12.7376 and the AIC is 126.4, which is a smaller AIC 
value than that for the four variance model, indicating the two variance model is adequate 
to describe the data. Using a model with fewer variances in the model specification 
provides more degrees of freedom for the respective standard errors and thus provides 
more powerful tests of hypotheses concerning the fixed effects in the model. 


2.9 Concluding Remarks 

In summary, for comparing all means, the following are recommended: 

1) If the homogeneity of variance test is not significant at the 1% level, do the usual 
analysis of variance test. 
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2) If the homogeneity of variance test is significant at the 1% level use either Welch's 
test or the mixed models test and the corresponding approximate denominator 
degrees of freedom. 

3) If the homogeneity of variance is significant at the 1% level, use the AIC to deter¬ 
mine if a simpler or fewer number of variances can be used to adequately describe 
the data in order to increase the power of tests concerning the means. 

Many text books and articles have been written about using transformations on data in 
order to achieve equal treatment variances so that the usual analysis of variance can be used 
to compare the treatments. With the ability to fit an unequal variance model to provide 
estimated standard errors of means and comparisons of means, many situations will not 
require the use of transformations. One major benefit of not having to use a transformation 
to achieve equal variances is that the units of the means are in the units of measurement, 
thus simplifying interpretations. 

This chapter contains discussion about the statistical analysis of a one-way analysis of 
variance model with heterogeneous errors. The discussion included several statistical tests 
for determining homogeneity of the error variances and recommendations on when to use 
each test. Procedures appropriate for making statistical inferences about the effects of 
different treatments upon discovering heterogeneous error variances as well as examples 
illustrating the use of these procedures were also reviewed. 


2.10 Exercises 

2.1 The following data are body temperatures of calves that were vaccinated and 
then challenged to determine if the vaccination protected the animal. Test the 
equality of variances of the treatment groups using two or more techinques. 
Based on the results of the test of equality of variances, test the equality of the 
treatment means using both Welch's and the mixed model F-statistics and make 
all pairwise comparisons. 


Data for Exercise 2.1 


Vaccine A 

Vaccine B 

Vaccine C 

Vaccine D 

Vaccine E 

Vaccine F 

Vaccine G 

101.5 

96.3 

101.8 

97.3 

97.5 

96.9 

97.3 

100.5 

97.2 

97.4 

96.8 

96.4 

97.1 

100.7 

104.5 

99.3 

104.9 

97.1 

98.6 

96.8 

103.3 

102.3 

98.0 

104.0 

97.0 

96.6 

97.0 

100.2 

100.6 

97.6 

103.7 

97.1 


96.2 

103.5 

97.7 

96.8 

104.5 

96.9 


96.6 



99.1 

100.4 

96.1 





96.7 

102.2 

96.3 





96.4 

100.2 

96.7 







97.1 





2.2 Use the data in Table 1.1 and test the equality of variances using several of the 
methods described in Section 2.2. What is your conclusion? 

2.3 The data in the following table are times required for a student to dissolve a 
piece of chocolate candy in their mouth. Each time represents one piece of candy 
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dissolved by one student. Provide a detailed analysis of the data set and provide 
tests of the following hypotheses: 

1) The mean of the Blue Choc = the mean of the Red Choc. 

2) The mean of the Buttons = the mean of the means of the Blue Choc and 
Red Choc. 

3) The mean of the ChocChip = the mean of the WchocChip. 

4) The mean of the Small Choc = Vi the mean of the means of the Blue Choc and 
Red Choc. 

5) The mean of the Blue Choc and Red Choc = the mean of the ChocChip and 
WchocChip. 


Data for Exercise 2.3 


Buttons 

Blue Choc 

Small Choc 

ChocChip 

WChocChip 

Red Choc 

69 

57 

28 

52 

35 

47 

76 

41 

27 

50 

37 

70 

59 

70 

28 

60 

38 

48 

55 

66 

30 

55 

40 

51 

68 

48 

29 

57 

34 

42 

34 

62 

28 

49 

35 


35 


24 


36 



2.4 The following data are the amount of force (kg) required to fracture a concrete 
beam constructed from one of three beam designs. Unequal sample sizes 
occurred because of problems with the pouring of the concrete into the forms for 
each of the designs. 

1) Write out an appropriate model to describe the data and describe each 
component of the model. 

2) Estimate the parameters of the model in part 1. 

3) Use Levene's, O'Brien's, and Brown-Forsythe's methods to test the equality 
of the variances. 

4) Use Welch's test to test H 0 : p r - p 2 - p 3 vs H a : (not H 0 ). 

5) Use the mixed model F-test to test H 0 : q, = p 2 = M 3 vs (not H 0 ). 


Data for Exercise 2.4 


Design 

Beam 1 Beam 2 

Beam 3 

Beam 4 

Beam 5 

Beam 6 Beam 7 Beam 8 

Beam 9 

Beam 10 

1 

195 232 

209 

201 

216 

211 205 



2 

231 215 

230 

221 

218 

227 218 219 



3 

223 226 

223 

224 

224 

226 227 224 

226 

226 


2.5 Four rations with different amounts of celluose were evaluated as to the amount 
of feed required for a chicken to gain one pound during the trial. Twenty-four 
chickens were randomly assigned to the four rations (six chickens per ration) 
and the chickens were raised in individual cages. 

1) Write out an appropriate model to describe the data and describe each 
component of the model. 
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2) Estimate the parameters of the model in part 1. 

3) Use Levene's, O'Brien's, and Brown-Forsythe's methods to test the equality 
of the variances. 

4) Use Welch's test to test H a : p 1 - p 2 = p 3 = p 4 vs H a : (not H 0 ). 

5) Use the mixed model F test to test H 0 : p 4 = p 2 = p 3 - p 4 vs H a : (not H 0 ). 

6) Construct 90% confidence intervals about c u c 2 , and c 3 where c, = p, - p 2 + 
h? — h i/ c 2 = hi + Ri ~ fh — hr/ and c 3 = p 4 — p 2 — p 3 + p 4 . 


Data for Exercise 2.5 



Chick 1 

Chick 2 

Chick 3 

Chick 4 

Chick 5 

Chick 6 

Ration 1 

2.60 

2.54 

2.87 

2.33 

2.45 

2.77 

Ration 2 

3.87 

3.18 

2.59 

3.62 

2.71 

3.08 

Ration 3 

2.69 

5.31 

2.08 

4.00 

3.12 

4.19 

Ration 4 

4.43 

5.59 

5.06 

4.17 

5.17 

4.47 





Simultaneous Inference Procedures and 
Multiple Comparisons 


Often an experimenter wants to compare several functions of the ,u, in the same experi¬ 
ment, leading to a multiple testing situation. Experimenters should consider all functions of 
the fj-i that are of interest; that is, they should attempt to answer all questions of interest 
about relationships among the treatment means. The overriding reason to include more 
than two treatments in an experiment or study is to be able to estimate and/or test hypo¬ 
theses about several relationships among the treatment means. Often the treatments are 
selected to provide a structure of comparisons of interest (see, for example, the drug experi¬ 
ment in Section 2.4). At other times, the experimenter may be interested in comparing each 
treatment to all other treatments, that is, making all pairwise comparisons. This would be 
the case, for example, when one is comparing the yields of several varieties of wheat or for 
any other set of treatments that have been selected for a comparative study. 

One concern when making several comparisons in a single experiment is whether 
significant differences observed are due to real differences or simply to making a very 
large number of comparisons. Making a large number of comparisons increases the chance 
of finding differences that appear to be significant when they are not. For example, if an 
experimenter conducts 25 independent tests in an experiment and finds one significant 
difference at the 0.05 level, she should not put too much faith in the result because, on aver¬ 
age, she should expect to find (0.05)(25) = 1.25 significant differences just by chance alone. 
Thus, if an experimenter is answering a large number of questions with one experiment 
(which we believe one should do), it is desirable to have a procedure that indicates whether 
the differences might be the result of chance alone. Fisher (1949) addressed this problem 
when he put forward the protected least significant difference (LSD) procedure. Since then, 
many authors have contributed to the area of multiple testing where procedures for 
numerous settings have been developed. 

In this chapter, several well-known and commonly used procedures for making multiple 
inferences are discussed and compared. Some of the procedures are primarily used for 
testing hypotheses, while others can also be used to obtain simultaneous confidence inter¬ 
vals; that is, a set of confidence intervals for a set of functions of the /i, can be derived for 
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which one can be 95% confident that all the confidence intervals simultaneously contain 
their respective functions of the p,. 


3.1 Error Rates 

One of the main ways to evaluate and compare multiple comparison procedures is to 
calculate error rates. If a given confidence interval does not contain the true value of the 
quantity being estimated, then an error occurs. Similarly, if a hypothesis test is used, an 
error is made whenever a true hypothesis is rejected or a false hypothesis is not rejected. 
Next four kinds of error rates are defined. 

Definition 3.1: The comparisonwise error rate is equal to the ratio of the number of incor¬ 
rect inferences made to the total number of inferences made in all experiments analyzed. 

Definition 3.2: The experimentwise error rate (EER) is equal to the ratio of the number 
of experiments in which at least one error is made to the total number of experiments ana¬ 
lyzed. It is the probability of making at least one error in an experiment when there are no 
differences between the treatments. The EER is also referred to as the experimentwise 
error rate under the complete null hypothesis (EERC). 

Definition 3.3: The familywise error rate (FWER) (Westfall et al., 1999) is the probability 
of making at least one erroneous inference for a predefined set of k comparisons or confi¬ 
dence intervals. The set of k comparisons or confidence intervals is called the family of 
inferences. 

Definition 3.4: The false discovery rate (FDR) (Benjamini and Flochberg, 1995) is the 
expected proportion of falsely rejected hypotheses among those that were rejected. 

The EER controls the error rate when the null hypothesis is that all of the treatments are 
equally effective, that is, there are no differences among the treatment means. But many 
experiments involve a selected set of treatments where there are known differences among 
some treatments. Instead of an all means equal null hypothesis, there may be a collection 
of k null hypotheses, H 01 ,H 02 ,... ,H ok about the set of t means. These k null hypotheses are 
called partial null hypotheses and the error rate is controlled by using a method that 
controls the FWER (Westfall et al., 1999). For example, the set of treatments in Exercise 2.3 
are six candy types, buttons, blue choc, red choc, small choc, chocChip and WchocChip. It 
is known at the start of the study that the time required to dissolve the small choc is much 
less than the time required to dissolve any of the other candies. The null question could be: 
Is the time it takes to dissolve a small choc equal to one-half of the mean times to dissolve 
the red and blue chocs? In this case a method that controls the FWER is in order since the 
condition of using a method that controls the EER does not hold; that is, it is known that 
the mean times are not all equal from the start. The FDR is very useful in the context of 
microarray experiments in genetics. 

In order to avoid finding too many comparisons significant by chance alone in a single 
experiment, one quite often attempts to fix the experimentwise error rate, when applicable, 
or the FWER when needed at some prescribed level, such as 0.05. Whenever an experi¬ 
menter is trying to answer many questions with a single experiment, it is a good strategy 
to control the FWER. 
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3.2 Recommendations 

There are five basic types of multiple comparison problems: 1) comparing a set of treatments 
to a control or standard; 2) making all pairwise comparisons among a set of t means; 3) con¬ 
structing a set of simultaneous confidence intervals or simultaneous tests of hypotheses; 

4) exploratory experiments where there are numerous tests being conducted; and 5) data 
snooping where the comparisons are possibly data-driven. In the first four situations, the 
number of comparisons or confidence intervals or family of inferences is known before the 
data are analyzed. In the last situation, there is no set number of comparisons of interest and 
the final number can be very large. The recommendations given in this chapter are based on 
information from Westfall et al. (1999), SAS Institute, Inc. (1999), and Westfall (2002). 

1) If the experiment is an exploratory or discovery study and the results are going to 
be used to design a follow-up or confirmatory study, then possibly no adjustment 
for multiplicity is necessary, thus use t -tests or unadjusted confidence intervals 
based on LSD values. 

2) Use Dunnett's procedure for comparing a set of treatments with a control. There 
are two-sided and one-side versions of Dunnett's procedure, so one can select a 
version to fit the situation being considered. 

3) For pairwise comparisons, if there is an equal number of observations per treat¬ 
ment group, use Tukey's method. If the data are unbalanced, then use the method 
that simulates (Westfall et al., 1999) a percentage point, taking into account the 
pattern of unequal numbers of observations. 

4) If the set of linear combinations is linearly independent, then the multivariate t can 
be used to construct confidence intervals or to test hypotheses. If the linear combi¬ 
nations are uncorrelated or orthogonal, the multivariate t works well. If the linear 
combinations are not uncorrelated, then a simulation method that incorporates the 
correlation structure should be used instead of the multivariate t. Most cases with 
unequal numbers of observations per treatment group provide correlated linear 
combinations and the simulation method should be used. 

5) The Bonferroni method can be used to construct simultaneous confidence inter¬ 
vals or tests about a selected number of linear combinations of the means, but if 
the number of combinations of interest is large (say 20 or more), the Scheffe 
procedure can often produce shorter confidence intervals, so check it out. For a 
set of hypotheses, the methods of Sidak (1967), Holm (1979), or Sidak-Holm can 
be used effectively. When the linear combinations are uncorrelated these 
bounds are quite good, but when there are correlations among the linear com¬ 
binations, the realized FWER can be much less than desired. SAS®-MULTTEST 
can be used carry out bootstrap and simulated percentage points for a given set 
of comparisons that takes into account the correlation among the comparisons 
within the set. 

6) For data snooping or for data-driven comparisons or hypotheses, use Scheffe's 
procedure as one can make as many comparisons as one wants and still control 
the EER or FWER. 

7) For studies such as genetic studies that involve thousands of comparisons, use a 
method that controls the FDR, such as the method suggested by Benjamini and 
Hochberg (1995). 
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8) For studies that involve evaluating the safety of a treatment as compared with a 
control or placebo for possible adverse effects, use a method that does not correct 
for multiple tests or comparisons. Adjustment for multiplicity may not be needed 
for safety studies, where it is much more serious to make a type II error than it is 
to make a type I error. 

9) Once the type of comparison is determined and the desired level of error rate con¬ 
trol is specified, select the method satisfying these conditions that provides the 
smallest p-values or smallest critical differences or shortest confidence interval 
widths. 

Each of the recommended multiple comparison procedures as well as a few other popu¬ 
lar procedures available for the one-way treatment structure of Chapter 1 are examined in 
the following discussion. Each of the procedures can also be used in much more complex 
situations, as will be illustrated throughout the remainder of this book. The parameter v 
used during the remainder of this book represents the degrees of freedom corresponding 
to the estimator of a 2 . For the one-way case of Chapter 1, the error degrees of freedom are 
v = N-t. 


3.3 Least Significant Difference 

The LSD multiple comparison method has possibly been used more than any other method, 
perhaps because it is one of the easiest to apply. It is usually used to compare each treat¬ 
ment mean with every other treatment mean, but it can be used for other comparisons as 
well. The LSD at the 100% significance level for comparing p, to p, is 


LSD„ = t a/Z v o \— + — 


(3.1) 


One concludes that Ip, p, if p, - Pj\ > LSD„. This procedure has a comparisonwise error 
rate equal to a. A corresponding (1 - a)100% confidence interval for p, - p, is 


1 1 

h-i*i ±t aa.v°J— + - 


n, n, 


(3.2) 


If all sample sizes are equal (to n, say), then a single LSD value can be used for all pair¬ 
wise comparisons. In this case, the single LSD tt value is given by 


LSD a — t a/2v 6 


(3.3) 


Suppose a study includes t treatment means and that all possible pairwise comparisons 
at the 5% significance level are going to be made. Comparisons of the comparisonwise and 
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TABLE 3.1 

Simulated Error Rates for the LSD Procedure 

Number of treatments 2 3 4 5 6 8 10 20 

Comparisonwise error rate 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 

Experimentwise error rate 0.05 0.118 0.198 0.280 0.358 0.469 0.586 0.904 


experimentwise error rates for experiments with different values of t are displayed in 
Table 3.1. The information in the table applies to cases where all treatment means are equal. 
Table 3.1 shows that, in an experiment involving six treatments, 35.8% of the time one 
would find at least one significant difference, even when all the treatment means were 
equal to one another. Obviously, using the LSD procedure could be very risky without 
some additional protection. When there is more than one test or parameter or linear 
combination of parameters of interest, meaning there is a multiplicity problem, some 
adjustment should be taken into account in order to eliminate discovering false results. 
Westfall et al. (1999) present an excellent discussion of all of the problems associated with 
the multiplicity problem and/or the multiple comparison problem. The following discus¬ 
sion attempts to describe those procedures that are useful or have been used in the analysis 
and interpretation of the results from designed experiments. 


3.4 Fisher's LSD Procedure 

Fisher's recommendation offers some protection for the LSD procedure discussed in the 
preceding section. In Fisher's procedure, LSD tests are made at the al00% significance 
level by utilizing Equation 3.1, but only if H 0 : p ] = p 2 = ■■■ = p, is first rejected at that level of 
a by the F-test discussed in Chapter 1. 

This gives a rather large improvement over the straight LSD procedure since the 
experimentwise error rate is now approximately equal to a. Flowever, it is possible to 
reject H 0 : p t = p 2 = ■ ■ ■ = p t and not reject any of H 0 : = p f for i ^;. It is also true that this 

procedure may not detect some differences between pairs of treatments when differ¬ 
ences really exist. In other words, differences between a few pairs of treatments may 
exist, but equality of the remaining treatments may cause the F-test to be nonsignificant, 
and this procedure does not allow the experimenter to make individual comparisons 
without first obtaining a significant F-statistic. The other problem with this procedure 
is that many experiments contain treatments where it is known there are unequal means 
among some subsets of the treatments. In this case, one expects to reject the equal 
means hypothesis and the LSD would be used to make all pairwise comparisons. If a 
subset of the treatments has equal means, then more of the pairwise comparisons will 
detected as being significantly different than expected. Thus the FWER is not main¬ 
tained. Fisher's LSD can be recommended only when the complete null hypothesis is 
expected to be true. 

These two LSD procedures are not recommended for constructing simultaneous confi¬ 
dence intervals on specified contrasts of the ,u : because the resulting confidence intervals 
obtained will generally be too narrow. 
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Each of the above LSD procedures can be generalized to include several contrasts of the 
treatment means. The generalization is: conclude that £' =1 c,/;, ^ 0 if 



Examples are given in Sections 3.10, 3.12, 3.14, and 3.16. 


(3.4) 


3.5 Bonferroni's Method 

Although this procedure may be the least used, it is often the best. It is particularly good 
when the experimenter wants to make a small number of comparisons. This procedure is 
recommended for planned comparisons whenever it is necessary to control the FWER. 
Suppose the experimenter wants to make p such comparisons. She would conclude that 
the qth comparison X’., c icj p, ^ 0, q = 1,2,..., p, if 



These p-tests will give a FWER less than or equal to a and a comparisonwise error rate 
equal to a/p. Usually the FWER is much less than a. Unfortunately, it is not possible to 
determine how much less. Values of t a/2 „ v for selected values of a, p, and v are given in the 
Appendix in Table A.2. For example, if a= 0.05, p = 5, and v= 24, then from Table A.2 one 
gets t a/2pjV = 2.80. The examples in Sections 3.10, 3.12, 3.14, and 3.16 demonstrate the use of 
the Bonferroni method. The tables m is equivalent to our p. 

Simultaneous confidence intervals obtained from the Bonferroni method, which can be 
recommended, have the form: 


± f 


i =1 


a/2p,v 


G 



q = 1, 2, ...,p 


(3.6) 


The Bonferroni method can be applied to any set of functions of the parameters of a 
model, including variances as well as means. 


3.6 Scheffe's Procedure 

This procedure is recommended whenever the experimenter wants to make a large 
number of "unplanned" comparisons. Unplanned comparisons are comparisons that the 
experimenter had not thought of making when planning the experiment. These arise 
frequently, since the results of an experiment frequently suggest certain comparisons to 
the experimenter. This procedure can also be used when there are a large number of 
planned comparisons, but the widths of the confidence intervals are generally wider than 
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for other procedures, although not always. Consider testing H 0 \ X-, c,p l = 0 for a given 
contrast vector c. It is true that 


Pr< 


ft t \ 2 

Xc/f-Xca 

V 1=1 i=l 


Scf/n,. 


< (f-l)F a f _ x v <7 2 for all contrast vectors c 


1 = 1 - a 


Thus a procedure with an FWER equal to a for comparing all possible contrasts of the ,u, to 
zero is as follows: Reject H 0 \ X|, c : p, = 0 if 




i =1 




(3.7) 


This procedure allows one to compare an infinite number of contrasts to zero while 
maintaining an experimentwise error rate equal to a. However, most experimenters will 
usually not be interested in an infinite number of comparisons; that is, only a finite number 
of comparisons are of interest. Scheffe's procedure can still be used, but in this case, the 
FWER will generally be much smaller than a. Bonferroni's method or the multivariate 
f-method when appropriate will often be better (narrower confidence interval or more 
powerful test) than Scheffe's procedure for a finite number of comparisons. That is, a 
smaller value of XLiCA can often enable one to declare that X| ,c,q, is significantly different 
from zero using Bonferroni's method or the multivariate f-method than can be declared 
significant by Scheffe's method. However, if one is going to "muck around" in the data to 
see if anything significant turns up, then one should use Scheffe's method, since such com¬ 
parisons are really unplanned comparisons rather than planned comparisons. It should be 
noted that Scheffe's method will not reveal any contrasts significantly different from zero 
unless the F-test discussed in Chapter 1 rejects H 0 : p ] = p 2 = ■■■ = p,. Scheffe's procedure can 
also be used to obtain simultaneous confidence intervals for contrasts of the p t . The result 
required is that, for any set of contrasts c v c 2 ,..., one can be at least (1 - a)100% confident 
that XLi c iq Pi will be contained within the interval given by 


X C >A ± “ VAAZ for all q = 1 , 2 ,... 


(3.8) 


If one wants to consider all linear combinations of the /./, rather than just all contrasts, 
then V[(f - l)F at - l v ] must be replaced by f[tF a/fiV ] in Equations 3.7 and 3.8. 

Examples can be found in Sections 3.10,3.12, and 3.14. 


3.7 Tukey-Kramer Method 

The preceding procedures can be used regardless of the values of the Tukey's (Tukey, 
1952, 1953; Kramer, 1956) honest significant difference (HSD) procedure was designed to 
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make all pairwise comparisons among a set of means. The procedure, however, requires 
equal Tukey (1953) and Kramer (1956) provided a modification for the case where one 
has unequal sample sizes. Hayter (1984) provided proof that the Tukey-Kramer method 
provides FWER protection, although an approximate procedure can be used if the n ; are 
not too unequal. The Tukey-Kramer method is to reject H 0 : p, = p r for i ± i' if 


la 1 2 

f 1 

1) 

| ^ ya,t,v^l 2 | 

u 

- 


(3.9) 


where q atv is the upper percentile of the distribution of the Studentized range statistic. 
Values of q afv for selected values of a, t, and v are given in Appendix Table A.4. 

If the sample sizes are all equal to n, then the decision is to reject H 0 : p. = p r for i ^ i' if 

~ ~ i [ 6 ^ 

V n 

Tukey's general procedure for equal sample sizes is to reject H 0 : Y l t i=1 c i p i = 0 for a con¬ 
trast if 




2=1 


> q a .i 





j 


3.8 Simulation Methods 

For unequal sample size problems, for problems where the comparisons are other than 
pairwise comparisons, and for problems where the comparisons are not linearly indepen¬ 
dent, the above methods provide FWER significance levels that are less than desired. In 
this case, the percentage points for the appropriate set of comparisons can be simulated. 

Suppose you are interested in p linear combinations of the p, such as X'., c,„q„ q = 
1,2,... ,p and it is desired to provide a procedure that controls the FWER for either the set 
of hypotheses H 0 : X', c icj Pi = 0, q = 1,2,... ,p or a set of simultaneous confidence intervals 
for X ', c u p r The process is: 

1) Generate a sample of data in the same structure of the data set at hand. If there are 
five treatments with sample sizes, 5, 9, 3,6, and 7, generate data with those sample 
sizes. 

2) Carry out the analysis of the generated data set as is to be done with the actual 
data set and compute the p f-statistics: 


t = 



q = 1 , 2 ,...,p 
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3) Compute the maximum of the absolute values of the t q , T s = max( | f, |, \t 2 \,..., \ t p \). 

4) Repeat steps 1, 2 and 3 a very large number of times, keeping track of the com¬ 
puted values of T s . Determine the upper alOO percentile of the distribution of the 
T s , and denote this percentile by T a . 

5) For the actual data set, compute t q , q = 1,2 ,...,p and reject the c/th hypothesis if 
| t q | > T, q = 1,2,..., p or construct simultaneous confidence intervals as 

5XA ± T k 2 'E^/n i/ q = 1,2,...,p 
i=i V i=i 

The accuracy of the simulation can be specified by using the method of Edwards and Berry 
(1987). SAS-MULTTEST can be used to obtain simultaneous inferences using the bootstrap 
method (Westfall et al., 1999). Bootstrap methodology is beyond the scope of this book. 


3.9 Sidak Procedure 

Sidak (1967) provided a modification of the Bonferroni method by using a different per¬ 
centage point for each of the comparisons. The process is to compute a f-statistic for each 
of the comparisons: 


,=1 —, <7 = 1,2. p 

Compute the significance level for each comparison and order the significance levels 
from smallest to largest as p u p 2 ,..., p p . For a FWER of a, reject the individual comparison 
if p q < 1 - (1 - a) 1 T or equivalently if a > 1 - (1 - p 1( )h 



3.10 Example—Pairwise Comparisons 

The task data in Section 1.6 is used to demonstrate the results of the above multiple com¬ 
parisons procedures. Table 3.2 contains the SAS-Mixed code to fit the one-way means 
model and the LSMeans statements are used to extract several of the multiple comparison 
procedures. Table 3.3 contains the percentage points used to provide confidence differ¬ 
ences or significant differences for the simulate, Tukey-Kramer, Bonferroni, Sidak, Scheffe, 
and t (unadjusted) methods. Excluding the unadjusted f, the other methods provide 0.05 
type I FWER for all pairwise comparisons. The simulate and Tukey-Kramer methods use 
the smallest quantiles with the Sidak and Bonferroni methods in the middle, while the 
Scheffe method is largest. Table 3.4 contains the critical significant differences for each of 
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TABLE 3.2 


SAS System Code Using Proc Mixed to Request the Computation of 
Several Multiple Comparisons Procedures for All Pairwise Comparisons 


PROC mixed DATA=EX1; CLASS TASK; 

MODEL PULSE20=TASK/NOINT SOLUTION; 

LSMEANS TASK/ DIFF CL; 

LSMEANS TASK/ DIFF ADJUST=TUKEY CL; 

LSMEANS TASK/ DIFF ADJUST=BON CL; 

LSMEANS TASK/ DIFF ADJUSTUSCHEFFE CL; 

LSMEANS TASK/ DIFF ADJUST=SIDAK CL; 

LSMEANS TASK/ DIFF ADJUSTUSIMULATE (REPORT SEED=4938371) CL; 


TABLE 3.3 


Percentage Points Used for All Pairwise Comparisons of the Six 
Task Means 


Simulation Results 




Method 

95% Quantile 

Estimated a 

99% Confidence Limits 

Simulate 

2.932480 

0.0500 

0.0450 

0.0550 

Tukey-Kramer 

2.940710 

0.0486 

0.0436 

0.0535 

Bonferroni 

3.053188 

0.0359 

0.0316 

0.0401 

Sidak 

3.044940 

0.0370 

0.0326 

0.0413 

Scheffe 

3.437389 

0.0131 

0.0105 

0.0157 

t 

1.998972 

0.3556 

0.3446 

0.3666 


TABLE 3.4 


Critical Differences Used to Compare the Differences between Pairs of Means for the Unadjusted 
t and Several Multiple Comparison Procedures 


TASK 

_TASK 

Estimate 

Standard 

Error 

t 

Bonferroni 

Tukey- 

Kramer 

Scheffe 

Sidak 

Simulate 

1 

2 

0.840 

2.225 

4.449 

6.795 

6.544 

7.650 

6.776 

6.526 

1 

3 

-3.877 

2.338 

4.674 

7.139 

6.876 

8.038 

7.120 

6.857 

1 

4 

-6.077 

2.338 

4.674 

7.139 

6.876 

8.038 

7.120 

6.857 

1 

5 

2.423 

2.225 

4.449 

6.795 

6.544 

7.650 

6.776 

6.526 

1 

6 

3.105 

2.277 

4.553 

6.953 

6.697 

7.828 

6.935 

6.679 

2 

3 

-4.717 

2.380 

4.758 

7.267 

7.000 

8.182 

7.248 

6.980 

2 

4 

-6.917 

2.380 

4.758 

7.267 

7.000 

8.182 

7.248 

6.980 

2 

5 

1.583 

2.270 

4.537 

6.929 

6.674 

7.801 

6.911 

6.655 

2 

6 

2.265 

2.321 

4.639 

7.085 

6.824 

7.977 

7.066 

6.805 

3 

4 

-2.200 

2.486 

4.970 

7.591 

7.311 

8.546 

7.570 

7.291 

3 

5 

6.300 

2.380 

4.758 

7.267 

7.000 

8.182 

7.248 

6.980 

3 

6 

6.982 

2.429 

4.855 

7.416 

7.143 

8.349 

7.396 

7.123 

4 

5 

8.500 

2.380 

4.758 

7.267 

7.000 

8.182 

7.248 

6.980 

4 

6 

9.182 

2.429 

4.855 

7.416 

7.143 

8.349 

7.396 

7.123 

5 

6 

0.682 

2.321 

4.639 

7.085 

6.824 

7.977 

7.066 

6.805 
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TABLE 3.5 


Adjusted Significance Levels to Test the Equality of All Pairwise Comparisons of TASK Minus 
_TASK Obtained from Six Procedures Where t Corresponds to the Unadjusted f 


TASK 

_TASK 

t 

Bonferroni 

Tukey-Kramer 

Scheffe 

Sidak 

Simulate 

1 

2 

0.7072 

1.0000 

0.9990 

0.9996 

1.0000 

0.9990 

1 

3 

0.1024 

1.0000 

0.5642 

0.7378 

0.8021 

0.5646 

1 

4 

0.0117 

0.1751 

0.1129 

0.2552 

0.1615 

0.1111 

1 

5 

0.2805 

1.0000 

0.8840 

0.9446 

0.9928 

0.8804 

1 

6 

0.1777 

1.0000 

0.7484 

0.8661 

0.9469 

0.7501 

2 

3 

0.0520 

0.7795 

0.3645 

0.5642 

0.5509 

0.3657 

2 

4 

0.0051 

0.0761 

0.0546 

0.1506 

0.0735 

0.0545 

2 

5 

0.4880 

1.0000 

0.9815 

0.9923 

1.0000 

0.9813 

2 

6 

0.3328 

1.0000 

0.9238 

0.9651 

0.9977 

0.9234 

3 

4 

0.3796 

1.0000 

0.9488 

0.9772 

0.9992 

0.9474 

3 

5 

0.0103 

0.1543 

0.1014 

0.2364 

0.1437 

0.0985 

3 

6 

0.0055 

0.0831 

0.0590 

0.1596 

0.0799 

0.0584 

4 

5 

0.0007 

0.0104 

0.0087 

0.0366 

0.0104 

0.0090 

4 

6 

0.0004 

0.0053 

0.0046 

0.0219 

0.0053 

0.0052 

5 

6 

0.7699 

1.0000 

0.9997 

0.9999 

1.0000 

0.9998 


the pairwise comparisons. The observed differences for task 1 to task 4, task 2 to task 4, 
task 3 to task 4, task 3 to task 5, task 3 to task 6, task 4 to task 5 and task 4 to task 6 all exceed 
the critical differences for the t or LSD, which controls the comparisonwise error rate, but 
not the experimentwise error rate. Only the comparisons of task 4 to task 5 and task 4 to 
task 6 exceed the critical differences for the other five methods, all of which provide exper¬ 
iment wise error rate protection. The magnitudes of the critical differences are smallest for 
the uncorrected t or LSD method. The simulate and Tukey-Kramer critical differences are 
similar in magnitude while the simulate values are a little smaller. The Sidak and Bonferroni 
differences are similar in magnitude, with the Sidak values slightly smaller. The Scheffe 
critical differences are largest, as is expected since they control the FWER for an infinite 
number of comparisons and only 15 pairwise comparisons are made. A set of simultane¬ 
ous confidence intervals about all pairwise comparisons can be constructed by adding and 
subtracting the critical difference from the estimated difference. For example, the simulta¬ 
neous 95% confidence interval about p 2 -p 2 using the simulate method is 0.840 ± 6.526. 
Table 3.5 contains the adjusted p-va lues for each of the methods. The p-values provide the 
same decision as the 5% critical differences in Table 3.4. 


3.11 Dunnett's Procedure 

One really interesting case is that of comparing all treatments with a control. This type of 
inference is important in safety studies, where it is of interest to compare different doses 
of a treatment with the control or placebo. Dunnett's test is to declare a treatment mean p t 
to be significantly different from the mean of the control p 0 if 


Pi A) > d a ,t,v 
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where d atv is the upper alOO percentile of the "many-to-one f-statistic" (Miller, 1967). 
Dunnett's method controls the FWER. If the sample sizes are unequal, a simulate procedure 
can take into account the sample size structure and possibly provide a shorter bound. 


3.12 Example—Comparing with a Control 

The task data in Section 1.6 is used to demonstrate the process of comparing each treat¬ 
ment with a control. In this study, assume that task 2 is the control task and the other five 
tasks are the experimental tasks. Table 3.6 contains the SAS-Mixed code to use the unad¬ 
justed f, Bonferroni, Dunnett, Scheffe, Sidak, and Simulate methods to compare all of the 
other tasks with task 2. The option on the LSMean statement DIFF=CONTROF('2') requests 
that task 2 be considered as the control and is compared with each of the other tasks in the 
study. Table 3.7 contains the 95% quantiles for each of the methods. The Dunnett quantile 
is less than the others (except for the unadjusted f) with the simulate method very close. 
There are five comparisons being made, which dictates the magnitude of the Bonferroni 
and Sidak quantiles. The Scheffe quantile is the same as in Table 3.4, which is useful for an 
infinite number of comparisons. The only comparison where the observed difference 
exceeds the critical difference is for comparing task 4 to the control or task 2. A set of 
simultaneous confidence intervals about all differences between the treatment and control 
means can be constructed by adding and subtracting the critical difference in Table 3.8 


TABLE 3.6 

SAS System Code Using Proc Mixed to Request the Computation of Several Multiple 
Comparisons Procedures for Comparing Each Task to the Means of Task 2 (Control) 

PROC mixed DATA=EX1; CLASS TASK; 

MODEL PULSE2 0=TASK/NOINT; 

LSMEANS TASK/ DIFF=CONTROL(’2') CL; 

LSMEANS TASK/ DIFF=CONTROL(’2') ADJUST=BON CL; 

LSMEANS TASK/ DIFF=CONTROL(’2') ADJUST=DUNNETT CL; 

LSMEANS TASK/ DIFF=CONTROL(’2') ADJUST=SIDAK CL; 

LSMEANS TASK/ DIFF=CONTROL(’2') ADJUST=SIMULATE (REPORT SEED=4938371) CL; 
LSMEANS TASK/ DIFF=CONTROL(’2') ADJUST=scheffe CL; 


TABLE 3.7 


Percentage Points Used for Comparing Each Task Mean 
to the Mean of Task 2 (Control) 


Simulation Results 

Method 

95% Quantile 

Exact a 

Simulate 

2.590707 

0.0494 

Dunnett, two-sided 

2.585505 

0.0500 

Bonferroni 

2.657479 

0.0418 

Sidak 

2.649790 

0.0427 

Scheffe 

3.437389 

0.0048 

t 

1.998972 

0.1831 
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TABLE 3.8 


Critical Differences for Comparing Each Treatment with the Control for the Unadjusted f 
and Five Multiple Comparisons 


TASK 

TASK 

Estimate 

Standard 

Error 

t 

Bonferroni 

Dunnett 

Scheffe 

Sidak 

Simulate 

1 

2 

0.840 

2.225 

4.449 

5.914 

5.754 

7.650 

5.897 

5.765 

3 

2 

4.717 

2.380 

4.758 

6.326 

6.154 

8.182 

6.307 

6.167 

4 

2 

6.917 

2.380 

4.758 

6.326 

6.154 

8.182 

6.307 

6.167 

5 

2 

-1.583 

2.270 

4.537 

6.031 

5.868 

7.801 

6.014 

5.880 

6 

2 

-2.265 

2.321 

4.639 

6.167 

6.000 

7.977 

6.149 

6.012 


TABLE 3.9 


Adjusted Significance Levels for Testing H 0 : p t = p 2 Obtained from Six Procedures Where 
Task 2 is the Control and t Corresponds to the Unadjusted t 


TASK 

_TASK 

t 

Bonferroni 

Dunnett 

Scheffe 

Sidak 

Simulate 

1 

2 

0.7072 

1.0000 

0.9953 

0.9996 

0.9978 

0.9940 

3 

2 

0.0520 

0.2598 

0.1895 

0.5642 

0.2342 

0.1899 

4 

2 

0.0051 

0.0254 

0.0220 

0.1506 

0.0251 

0.0223 

5 

2 

0.4880 

1.0000 

0.9354 

0.9923 

0.9648 

0.9299 

6 

2 

0.3328 

1.0000 

0.7953 

0.9651 

0.8678 

0.7938 


from the estimated difference. For example, the simultaneous 95% confidence interval 
about p t -p 2 using the simulate method is 0.840 ± 5.765. The adjusted p-values for each of 
the methods are included in Table 3.9. 


3.13 Multivariate t 

The multivariate t method is a good method to use when the experimenter wants to consider 
a linearly independent set of linear combinations of the p r It is important that this procedure 
not be used for a set of linear combinations that are linearly dependent. This restriction pre¬ 
vents one from using this method for making all possible pairwise comparisons between a 
set of treatment means, since the set of all possible comparisons between means is not a 
linearly independent set. Discussion concerning linearly dependent linear combinations is 
given at the end of this section. If the experimenter wants to make p linearly independent 
comparisons, then she would conclude that the qth comparison X, c uj p, ^ 0 if 


t 

, - f 

L c iA 

i =1 

> ta/2,p,v a J, 


n. 


(3.10) 


where t (r/2pv is the upper a/2 percentile of a p-variate multivariate f-distribution with v 
degrees of freedom and correlation matrix I p . Values of t a/2pv are given in Appendix 
Table A.3 for selected values of a, p, and v. Simultaneous confidence intervals based on the 
multivariate t method are appropriate and can be recommended where m = p in Table A.3. 
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The multivariate t method has an FWER less than or equal to a. If the linear combina¬ 
tions , c u) p t are statistically independent or uncorrelated for ^ = 1,2 ,,p, then the FWER 
is exactly equal to a. For a specified set of linearly independent comparisons, the multi¬ 
variate t method will often give the best results. If the set of comparisons of interest to the 
experimenter is linearly independent, then the multivariate t method will always be better 
than Bonferroni's method, that is, the resulting confidence intervals will be narrower. 

The following result enables one to extend the application of the multivariate t proce¬ 
dure to a linearly dependent set of comparisons (however, the resulting tests or confidence 
intervals are not entirely satisfactory): Let /„ Z 2 ,..., / be a linearly independent set of linear 
combinations of the p,. If | l q \ < c q for q = 1,2,..., p, then 





To make use of this result, use the following procedure: 


1) Let l v /,,..., / f , be a linearly independent set of linear combinations of the ,u r This set is 
denoted as the linear combinations of primary interest to the experimenter. For this 
set of comparisons, conclude that l q = X) =1 c,- (J U, is significantly different from zero if 




, C .-A 


(3.11) 


2) Let l* be any comparison that is of secondary importance, that is, a linear combina¬ 
tion of the l v q = 1,2,... ,p. That is, l* = Xi, A„/ l( for some set of A 1( . One declares that 
l* is significantly different from zero if 


. V 

r . n -^ 

In > WAX 

NJX C A 

< 7=1 

l V « ) 


(3.12) 


An experimenter can make as many l*-type comparisons as needed without increasing the 
FWER. This extension of the multivariate t method gives very powerful tests for those 
comparisons of primary importance but is less powerful for those comparisons of second¬ 
ary importance. This is illustrated in an example in Section 3.16. 


3.14 Example—Linearly Independent Comparisons 

Table 3.10 contains the SAS-Mixed code to provide estimates, standard errors and confi¬ 
dence interval widths for five linearly independent contrasts of the task means. The five 
contrasts are: 


11 1 
Mi - 2<M 2 + M 3 ), 3 (Mi + M 2 + M 3 ) - 3 (M 4 + Ms + Ms) 

11 1 
Ms - 2 + 2 (Ml + ^ ~ 2 + ^ 

1 1 

2 (Mi + M 6 ) - ^(M 2 + M 3 + M 4 + Ms) 
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TABLE 3.10 

Proc Mixed Code to Provide Estimates of Five Linearly Independent Contrasts 

PROC mixed DATA=EX1; CLASS TASK; 

MODEL PULSE2 0=TASK/NOINT SOLUTION; 

estimate ' 1—0.5* (2 + 3) ' task 2—1—100 0/divisor=2 cl; 
estimate ' 1 + 2 + 3—4—5—6' task 111—1—1 —l/divisor=3 cl; 
estimate '6-. 5* (4 + 5)' task 0 0 0 —1 —1 2/divisor=2 cl;; 
estimate '1+6—4—5' task 100—1—1 l/divisor=2 cl;; 
estimate ' 1 + 6—. 5* (2 + 3 + 4 + 5) ' task 2 —1 —1 —1 —1 2/divisor=4 cl; 


TABLE 3.11 


Widths of Confidence Intervals for the Unadjusted t and Three Adjusted Methods 


Label 

Estimate 

Standard 

Error 

t 

Multivariate t 

Bonferroni t 

Scheffe 

1 - 0.5 * (2 + 3) 

-1.519 

1.948 

7.787 

10.319 

10.352 

13.390 

1+2+3-4-5-6 

0.829 

1.355 

5.416 

7.178 

7.200 

9.314 

6 - 0.4 * (4 + 5) 

-4.932 

2.056 

8.219 

10.891 

10.926 

14.133 

1+6-4-5 

-3.379 

1.647 

6.585 

8.727 

8.755 

11.324 

1 + 6 - 0.5 * (2 + 3 + 4 + 5) 

-3.225 

1.416 

5.661 

7.502 

7.526 

9.734 


Table 3.11 contains the results of computing the widths of the confidence intervals about 
each of the five contrasts using the multivariate t, Bonferroni t and Scheffe methods. The 
unadjusted t is also provided. The widths of the confidence intervals are computed by 
2 x Q x (stderr) where Q represents the quantile of the respective procedure. The quantiles 
areasfollows:Fortheunadjustedf,Q = t 002562 = 1.999,forthemultivariatef,Q = t 0 025,5,60 = 2.649 
(60 degrees of freedom were used from the table instead of 62 degrees of freedom), for the 
Bonferroni t, Q = t 00251562 = 2.657, and for Scheffe, Q = V(5F 0 055 62 ) = 3.437. The widths of 
the confidence intervals for the multivariate t are shorter than either of those given by the 
Bonferroni or Scheffe confidence intervals. The Bonferroni t confidence intervals are just a 
little wider than those of the multivariate t. The Bonferroni t confidence intervals 
increase in width as the number of comparisons increases. If the number of comparisons 
is 47 or less, the Bonferroni confidence intervals are shorter than those of the Scheffe 
method. For more than 47 comparisons, the Scheffe confidence intervals are shorter than 
the Bonferroni. 


3.15 Sequential Rejective Methods 

The previous methods are called single-step methods as only one step is needed to deter¬ 
mine the proper critical value for all tests. Each individual test or confidence level is evalu¬ 
ated without reference to the others. In the stepwise or sequential methods, the result of a 
given test depends on the results of previous tests (Westfall et al., 1999). These methods 
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can be applied to a set of tests of hypotheses. Assume there are p hypotheses of interest, 
H 0q : XLx c iq Pi = 0 vs H 0q : , c aq p { ^0, q = 1,2,p. Compute the p f-statistics 



and let p v p 2 ,... ,p f , denote the observed significance levels. Next order the p from smallest 
to largest as p m < p (2) < • ■ • < p (p) . 

3.15.1 Bonferroni-Holm Method 

The Bonferroni-Holm method starts with the Bonferroni adjustment for the first test, 
but increases the significance level for tests that follow (Holm, 1979). If p (1) > a/p, then fail 
to reject all of the hypotheses. If p (1) < a/p, then reject the hypothesis corresponding to p (V) 
and compare p (2) to a/(p - 1). If p (2) > a/(p - 1), then fail to reject all of the remaining hypo¬ 
theses. If P( 2 ) < a /(p ~ 1)/ then reject the hypothesis corresponding to p (2) and compare p (3) to 
a/(p - 2). Continue the procedure until for some q, p (q) > a/(p - q + 1), where one fails to 
reject hypothesis corresponding to p (q) and all of the remaining hypotheses. This method 
controls the FWER and is more powerful than the Bonferroni method since the ordered 
p-values are compared with larger values, except for the first one. 

3.15.2 Sidak-Holm Method 

The Holm modification used in the Bonferroni-Holm method can be used with the Sidak 
method where the adjusted p-values are 

Pd) = 1 - (l-Pd)/ 

V( 2 ) = min[p (1) , 1 - (1—p (2) )* -1 ] 

p (j) =mm[p (H) , 1 - (l-p 0) ) fe_7+1 ] 

V(k) ~ min (P ( Jt-i ) , P(jt)) 

This method provides control of the FWER when the comparisons are independent and 
for most situations involving dependent comparisons (Holland and Copenhaver, 1987). 

3.15.3 Benjamini and Hochberg Method to Control FDR 

Assume there arep comparisons of interest, X', c„.p„ q = \,2,...,p. Compute the p f-statistics 

t 

IX/h 
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and let p u p y ... ,p p denote the observed significance levels. Next order the p q from smallest 
to largest as p (]) < p (2} < • • • < p (p) . The adjusted p-values that control the FDR (but not the 
FWER) are 


Pm = Pm 

P(k- 1 ) = min ( P W ' [*/(A:-l)]p (i t_i)} 


P(k-j) = m Hp (H+ ryW(k-i)]p (k _ D } 

Pm = min(p ( 2) ,kp m ) 


3.16 Example—Linearly Dependent Comparisons 

Eight linearly dependent comparisons were selected and the unadjusted t, Bonferroni t, 
Bonferroni-Holm, Sidak, Sidak-Holm, and Benjamini-Hochberg methods were used to 
compute the adjusted significance levels. The eight comparisons are: 

Pi -P 2 >Pi -|(ju 2 +ju 3 ), p, - p 3 
1 1 

3 (Mi + Pi + P 3 ) - 3(^4 + Ps + Pel Pi ~ Pi 

11 1 
Pe-^Pi+Pi)'^Pi+P6)~^{Pi+P 5 ) 

1 1 

2 (Mi + Pe) ~ ^ (M2 +P 3 +Pi+ Ps) 

Table 3.12 contains the SAS-MULTTEST code to provide the adjusted p-values for these 
comparisons. The adjusted p-values are displayed in Table 3.13. The column labeled Raw 


TABLE 3.12 

Proc MULTTEST Code to Obtain Simultaneous Test Information for Eight 
Contrasts 

proc multtest data=exl pval bonferroni holm sidak stepsid FDR; 

class task; 

test mean(pulse20); 

contrast '1—2' 1—1 0 0 0 0; 

contrast 'l-.5*(2+3) # 2-1-100 0; 

contrast '1—3' 1 0—1 0 0 0; 

contrast ' 1 + 2 + 3-4-5-6 # 111-1-1 -1; 

contrast '4—5' 0 0 0 1-1 0; 

contrast '6-. 5* (4 + 5)' 0 0 0 -1 -1 2; 

contrast '1 + 6—4—5 7 1 0 0—1—1 1; 

contrast ' 1 + 6-. 5* (2 + 3 + 4 + 5) ' 2 -1 -1 -1 -1 2; 
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TABLE 3.13 


Raw and Adjusted p-Values for the Eight Contrasts 


Contrast 

Raw 

Bonferroni 

Stepdown 

Bonferroni 

Sidak 

Stepdown 

Sidak 

False 

Discovery Rate 

1-2 

0.7072 

1.0000 

1.0000 

0.9999 

0.8230 

0.7072 

1 - 0.5 * (2 + 3) 

0.4386 

1.0000 

1.0000 

0.9901 

0.8230 

0.5847 

1-3 

0.1024 

0.8190 

0.4095 

0.5785 

0.3508 

0.1638 

1+2+3-4-5-6 

0.5426 

1.0000 

1.0000 

0.9981 

0.8230 

0.6202 

4-5 

0.0007 

0.0056 

0.0056 

0.0055 

0.0055 

0.0056 

6 - 0.4 * (4 + 5) 

0.0195 

0.1557 

0.1362 

0.1455 

0.1285 

0.0699 

1+6-4-5 

0.0444 

0.3555 

0.2222 

0.3048 

0.2033 

0.0889 

1 + 6 - 0.5 * (2 + 3 + 4 + 5) 

0.0262 

0.2096 

0.1572 

0.1914 

0.1473 

0.0699 


contains the unadjusted t p-values. The Bonferroni column contains the Bonferroni 
adjusted p-values. The column Stepdown Bonferroni contains the Bonferroni-Holm 
adjusted p-values. The column Sidak contains the Sidak adjusted p-values. The column 
Stepdown Sidak contains the Sidak-Holm adjusted p-values. The column False Discovery 
Rate includes the Benjamini-Hochberg adjusted p-values. The Sidak adjusted p-values are 
generally smaller than those for the Bonferroni t. The stepdown Sidak adjusted p-values 
are generally smaller than those of the stepdown Bonferroni, and are smaller than the 
Sidak adjusted p-values. The Bonferroni, stepdown Bonferroni, Sidak, and stepdown Sidak 
methods provide values that are larger than those for the false discovery rate method. The 
false discovery rate would usually be used when there are numerous tests being made and 
it is not of interest to control the FWER. 

To demonstrate the application of the extension of the multivariate t to dependent 
comparisons, a confidence interval will be computed about p 2 - \(p 2 + p 3 ) given that the lin¬ 
early independent or primary set contains p 1 - ,u 2 and Mi ~~ M 3 - In this case. Mi - \{lh + M 3 ) = 
f (Mi - M 2 ) + K Mi - M 3 )/ which in terms of the results in Section 3.13, l* = Mi - K M 2 + M 3 ) = i (Mi ~ Mi) 
+ \ (Mi - M 3 )/ ar| d \ = -U = The critical difference used to evaluate I* (from Equation 3.12) is 




(2.649)(5.559) 


f 1 

IT 

1 

1 

IT 

n 

— 

, — 

H-h 

— 

J — 

H- 

u 

V13 

12 

2 

V13 

10 J 


|(5.895) + |(6.194) 
6.044 


Multiplying this result by 2 provides the width of a confidence interval as 12.088. The width 
of the confidence interval when p x -\ (p 2 + M 3 ) was one of the linearly independent set 
was 10.319, 10.352, and 13.390 for the multivariate t, Bonferroni t and Scheffe methods, 
respectively. The set of linearly independent comparisons is called the primary compari¬ 
sons and those that are linear functions of those in the primary set are called secondary 
comparisons. Using Equation 3.12 to provide critical differences for secondary compari¬ 
sons yields values that are considerably larger than would be obtained as a primary 
comparison, but in this case the width is still shorter than that provided by the Scheffe 
method. Thus, comparisons that are of secondary importance when using the multivariate 
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t method are much less sensitive than those using any of the other methods, except for the 
Scheffe method. Similar results hold for all other pairs of means. 


3.17 Multiple Range Tests 

Duncan's and Student-Newman-Keul's multiple range tests do not control the FWER and thus 
cannot be recommended. See Westfall et al. (1999, p. 154) for a discussion. These procedures are 
described next since they have been used extensively by researchers in many areas. 


3.17.1 Student-Newman-Keul's Method 


This method requires that the «, be equal to one another; however, a good approximation 
can be obtained provided that the sample sizes are not too unequal. In this case, the vari¬ 
able n in the following formulas is replaced by the harmonic mean of the The harmonic 
mean is given by 


n = t 


1 1 

— +-h ■ 

n, n. 


• + - 


The procedure controls the EERC under the all means equal hypothesis, but not if some 
means are equal and some means are not equal. 

To apply this method, rank the t means in ascending order as y (1) < y {2) < ■ ■ < y (f) . Next, the 
Studentized range across the t means is 


yp > y<i) 
a/^fn 

and this is compared with the critical point q a t v . If 

3/(0 ~~ 3 /( 1 ) - fj^ c la,t,v 

then one concludes that there are no significant differences among the t treatment means 
and no further tests are considered. If 


a 

3/(0 _ 3 /( 1 ) > ~j =C !a,t,v 

yn 

then one concludes that p {t) > p (l) or there is a significant range over the t means. Next, carry 
out comparisons using the two subsets of means {y (f) , y (f _ 1)7 y (2) } and {y (f _i), y (f _ 2) ,• ■ •, 

One compares the range of each set with [<3/f{ri)]q a t _ l v . If 

37(0 “ 3 /( 2 ) - 
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then one concludes there are no differences among the treatment means in the subset 
given by \p (t) , p (t _ iy .. ,,p (2) ). Likewise if 


y<t-i) i) - ^ ‘la,i-i,v 

then one concludes there are no differences among the treatment means in the subset 
given by {p a - V y Pa- 2 y • • •, bn,!- If both of the above conclusions hold, then one stops the pro¬ 
cess of making comparisons among means and one concludes that the means given in 
each of the two subsets of means are not significantly different. 

If 


3/(0 3/(2) 


then one concludes that p (tj > p (2) . Likewise if 


_ a 

3/(t-I) _ 3/(1) > fJ^ C la,t-Xv 

one concludes that p tt _ V) > ,u n) . If either of these two conclusions hold, then one would 
carry out additional comparisons within those subsets where significant differences are 
found. For example, if both of the preceding sets contained significant differences, then 
one would could consider the subsets given by jy (() , y (HL)/ ..., y (3) }, {y (H) , y (t _ 2) ,..., y (2) } and 
{y (f _ 2) , y (f _ 2) ,..., y (1) }. The range of each would be compared with (<7 /fn)q a , t _ 2 , v . One continues 
to examine smaller subsets of means as long as the previous subset had a significant range. 
Each time a range proves nonsignificant, the means involved are included in a single 
group. No subset of means grouped in a nonsignificant group can later be deemed 
significant; that is, no further tests should be carried out on means that have previously 
been grouped into a common subgroup. When all range tests prove nonsignificant, the 
procedure is complete. Any two means grouped in the same group are not significantly 
different; otherwise, they are significantly different. 

The method is illustrated using the task data from the example in Section 1.6 using 
a = 0.05. Since the sample sizes are not equal, the harmonic mean of the sample sizes is 
11.23. First rank the means in ascending order, as 


Task 6 5 2 1 3 4 

Mean 28.818 29.500 31.083 31.923 35.800 38.000 

Rank 1 2 3 4 5 6 


The first step is to compare across the range of six means or compare 38.000 - 28.818 = 9.182 
with 
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Since 9.819 > 6.90, one examines the two subsets of t -1 = 5 means. In this case, compare 
both 35.800 - 28.818 = 6.982 and 38.000 - 29.500 = 8.500 with 


^0.05,5,62 



30.9045 

11.23 


6.593 


Since 7.619 > 6.60 and 8.5 > 6.60, next look at subsets of t - 2 = 4 means, of which there 
are three. Here one compares 31.923 - 28.818 = 3.105, 35.800 - 29.500 = 6.300, and 
38.0-31.083 = 6.917 with 


^0.05,4,62 



(3.74), 


30.9045 

11.23 


6.195 


Since 3.105 < 6.195, the first four means are grouped in a single group, and since 6.300 > 6.195 
and 6.917 > 6.195, both of the remaining groups of four means must be further subdivided 
into groups of three means. Before proceeding with this next step, consider the following 
schematic diagram, which illustrates the present position where the line indicates the first 
four means form a group and are considered to not be different: 

28.818 29.500 31.083 31.923 35.800 38.000 


The first subset of four means, {29.500, 31.083, 31.923, 35.800), contains two groups of three 
means that have already been grouped together, namely, 29.500-31.923 and 28.818-31.083. 
Hence, the ranges 31.923 - 29.500 = 2.423 and 31.083 - 28.818 = 2.265 are not compared with 
a critical point. The ranges that still must be compared are 35.800 - 31.083 = 4.717 and 
38.0 - 31.923 = 6.077; these must be compared with 


^0.05,3,62 



30.9045 

11.23 


5.635 


Since 4.717 < 5.635, these three means are now grouped in a single group, whereas since 
6.077 > 5.635, the second group must be further subdivided into groups of two means each. 
The following diagram illustrates the present position: 

28.818 29.500 31.083 31.923 35.800 38.000 


There is only one subset of size two that has not already been combined into a common 
group, that being {35.8, 38.0). The range 38.0 - 35.8 = 2.2 is compared with 


^0.05,2,62 



30.9045 

11.23 


4.691 
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Since 2.2 < 4.69, these final two means are grouped together. The final diagram is: 
28.818 29.500 31.083 31.923 35.800 38.000 


H 


Another way to illustrate this information is to label means with the same letter if they 
occur in the same group and with different letters if they occur in different groups. Thus, 
the comparisons of the task data means can be represented as 


Task 

Mean 

1 

31.923 be 

2 

31.083 be 

3 

35.800 ab 

4 

38.000 a 

5 

29.500 c 

6 

28.818 c 


3.17.2 Duncan's New Multiple Range Method 

At present, this procedure is generally referred to as Duncan's method. It is one of the more 
popular methods, partly because it is often easier to find significant differences using this 
method than by any other, except perhaps Fisher's LSD method. But while Duncan's 
method controls the comparisonwise error rate, it does not control the FWER. This proce¬ 
dure also requires equal n„ and as in the preceding section, the variable n in the following 
formulas can be replaced by n if the sample sizes are not too unequal for an approximate 
procedure. 

Application of this procedure is similar to the application of the Student-Newman-Keul 
method except that the Studentized range critical point for comparing a group of p means, 
q„ jPjV , is replaced by q ap:PiV where a p = 1 - (1 - a) p Values of q ap , p , v are given in the Appendix 
Table A.5. For the data in the preceding section, this procedure is applied as follows: 

1) Compare 38.000 - 28.818 = 9.182 with 


tfa 6 ,6,62 


6 



(3.198), 


30.9045 

11.23 


5.303 


where a 6 = 1 - (1 - 0.05) 6-1 = 0.226. The range of six means is significant. 

2) Compare 35.800 - 28.818 = 6.982 and 38.000 - 29.500 = 8.500(3.143)(1.659) = 5.213 
with 


tfa 5 ,5,62 


6 



(3.143), 


30.9045 

11.23 


5.214 


where a 5 = 1 - (1 - 0.05) 5-1 = 0.185. Both ranges are significant. 
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3) Compare 31.923-28.818 = 3.105, 35.800-29.500 = 6.300, and 38.000-31.083 = 
6.917 with (3.073)(1.659) = 5.098. The latter two ranges are significant, while the 
first range is not. The grouping at this point is 

28.818 29.500 31.083 31.923 35.800 38.000 


4) Compare 35.800 - 31.083 = 4.717 and 38.000 - 31.923 = 6.077 to (2.976)(1.659) = 4.937. 
The second range is significant, while the first range is not. The groupings are 
now: 


28.818 29.500 31.083 31.923 35.800 38.000 


5) Compare 38.000 - 35.800 = 2.200 to (2.829)(1.659) = 4.693. The range is not signifi¬ 
cant. The final diagram is: 


28.818 29.500 31.083 31.923 35.800 38.000 


H 


In this case the Student-Newman-Keul's method and Duncan's method yield the same 
diagram; however, often they do not since they use different quantiles of the Studentized 
range. The Duncan's method uses smaller values to compare the ranges of means, which 
is the reason it does not control the FWER. 


3.18 Waller-Duncan Procedure 

This procedure is not applicable to most messy data situations because it also requires 
equal n ir but is implemented using the harmonic mean of the sample sizes when they are 
unequal. The procedure is included for several reasons, but primarily it is included because 
the procedure is not well known and because it seems to have some desirable properties, 
which are discussed later. The Waller-Duncan procedure uses the sample data to help 
determine whether a conservative rule (like Tukey-Kramer) or a nonconservative rule 
(like Fisher's LSD) is needed to make pairwise comparisons among the treatment means. 
The procedure makes use of the computed value of the F-test for testing H a : p t = p 2 = ■■■ = p,. 
If the F-value is small, then the sample data tend to indicate that the means are homoge¬ 
neous. In this case, the Waller-Duncan procedure requires a large absolute difference in 
sample means in order to declare significance, so as to prevent declaring too many differ¬ 
ences as being significant. If the F-value is large, the sample data would tend to indicate 
that the means are heterogeneous. In this case, the procedure requires a smaller absolute 
difference in the sample means in order to declare significance, so as to prevent declaring 
too few differences significant. 
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The Waller-Duncan procedure requires a constant K called the error rate ratio to be 
chosen, which designates the seriousness of a type I error relative to a type II error. The 
relationship between K and the probability of a type I error is approximately given below: 


Typical Value 

a 0.10 0.05 0.01 

K 50 100 500 


Thus, the critical points for this procedure depend on K, v, t, and the computed value of 
the F-statistic for testing H 0 : p t = q 2 = • • • = p t . Tables have not been included here; they are 
given in Ott (1988). To use the procedure, one calculates a Waller-Duncan LSD and com¬ 
pares all pairs of means to this single LSD value just as one would do with Fisher's LSD 
procedure or the Tukey-Kramer HSD procedure for the equal-sample-size problem. 
When one requests the Means/Waller option in SAS-GLM, the Waller-Duncan LSD 
value is computed and the means are grouped into diagrams similar to those given in the 
last two sections. 


3.19 Example—Multiple Range for Pairwise Comparisons 

The Duncan's, Student-Newman-Keul's, and Waller-Duncan methods can be accom¬ 
plished using the Means statement of SAS-GLM. Table 3.14 contains the GLM code to fit 
the one-way model to the task data set and the Means statements are included to provide 
the three methods to compare the six task means. Table 3.15 contains the results for 
Duncan's multiple comparison method. The critical range values are computed and are 
presented to enable one to look at a range of means and determine if they are different. For 
example, when the group involves four means, the range is compared with 5.096. The 
means are listed from largest to smallest and the letters A, B, and C show the significant 
groupings. Table 3.16 contains similar results for the Student-Newman-Keul's method. As 
indicated above, the critical range values for the Student-Newman-Keul's method are 
larger than those for the Duncan's method. This increase in the value of the critical range 
values enables the Student-Newman-Keul's procedure to control the experimentwise error 
rate under the complete null hypotheses (all means equal) where the experimentwise error 
rate is not controlled with Duncan's method. Table 3.17 contains the results for the Waller- 
Duncan method. The minimum significant difference is 4.448, a value smaller than any of 


TABLE 3.14 

Proc GLM Code to Produce the Ducan, 
Student-Newman-Kuels, and Waller- 
Duncan Multiple Comparison Procedures 

PROC GLM DATA=EX1; CLASS TASK; 
MODEL PULSE20=TASK; 

MEANS TASK/DUNCAN; 

MEANS TASK/SNK; 

MEANS TASK/WALLER; 
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TABLE 3.15 

Results of Using Duncan's Multiple Comparison Procedure 

a 



0.05 

Error degrees of freedom 



62 

Error mean square 



30.90445 

Harmonic mean of cell sizes 



11.22547 

Note: Cell sizes are not equal 




Number of means 2 3 4 

5 

6 


Critical range 4.691 4.935 5.096 

5.213 

5.303 



Means ivith the Same Letter are Not Significantly Different 


Duncan Grouping 

Mean 

N 

TASK 


A 

38.000 

10 

4 

B 

A 

35.800 

10 

3 

B 

C 

31.923 

13 

1 

B 

C 

31.083 

12 

2 


c 

29.500 

12 

5 


c 

28.818 

11 

6 


Note: This test controls the type I comparisonwise error rate, not the 
experimentwise error rate. 


TABEL 3.16 

Results of SNK Multiple Comparison Procedure 

a 0.05 

Error degrees of freedom 62 

Error mean square 30.90445 

Harmonic mean of cell sizes 11.22547 

Note: Cell sizes are not equal 

Number of means 2 3 4 5 6 

Critical range 4.691 5.635 6.195 6.593 6.900 

Means with the Same Letter are Not Significantly Different 


SNK Grouping 

Mean 

N 

TASK 


A 

38.000 

10 

4 

B 

A 

35.800 

10 

3 

B 

C 

31.923 

13 

1 

B 

C 

31.083 

12 

2 


c 

29.500 

12 

5 


c 

28.818 

11 

6 


Note: This test controls the type I experimentwise error rate under the 
complete null hypothesis but not under partial null hypotheses. 


the unadjusted t values in Table 3.4. The minimum significant difference uses the harmonic 
mean of the sample sizes to provide one value for all of the comparisons. The minimum 
significant difference is small because the value of the F-statistic is large, indicating that 
there are likely to be differences among the means. 
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TABLE 3.17 


Results of the Waller-Duncan Multiple Comparison 
Procedure 


K ratio 

100 

Error degrees of freedom 

62 

Error mean square 

30.90445 

F-value 

4.49 

Critical value of t 

2.03599 

Minimum significant difference 

4.7775 

Harmonic mean of cell sizes 

11.22547 

Note: Cell sizes are not equal 



Means with the Same Letter are Not Significantly Different 


Waller Grouping 

Mean 

N 

TASK 


A 

38.000 

10 

4 

B 

A 

35.800 

10 

3 

B 

C 

31.923 

13 

1 

B 

C 

31.083 

12 

2 


c 

29.500 

12 

5 


c 

28.818 

11 

6 


Note: This test minimizes the Bayes risk under additive loss and 
certain other assumptions. 


3.20 A Caution 

Before concluding this chapter, it is noted that the underlining or grouping procedure can 
give rise to inconsistencies when the sample sizes are unequal. Consider an example where 
the estimate of the standard deviation is 2, which is based on 50 degrees of freedom. The 
sample sizes and sample means are: 


Treatment 

1 

2 

3 

4 

A 

39.3 

40.1 

42.0 

43.0 

n i 

2 

25 

25 

2 


The F-statistic needed to test the hypothesis of equal means has a value of 4.90, with a sig¬ 
nificance level of 0.005. The 5% LSD value for comparing p 1 with ,t/ 2 , p 1 with p ,, p 2 with p A , 
and p 3 with p 4 is (2.008)(2)V(j + 2 ?) = 2.951. The 5% LSD value for comparing p 4 with p 4 is 
(2.008)(2)^(2 + 4) = 4.016 and the 5% LSD value for comparing p 2 with p 3 is (2.008)(2+ T) = 
1.136. Thus, for these data the difference between the largest and smallest means, 
p 1 - fi 4 = 3.7, is not significant, while the smaller difference between the two middle means, 
p 2 - p . 3 = 1.9, is significant. One can explain this apparent inconsistency by noting that 
there is enough information (large enough sample sizes) to claim a statistically significant 
difference between p 2 and p 3 , but not enough information (sample sizes are too small) to 
claim a statistically significant difference between any other pairs of means. 
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3.21 Concluding Remarks 

In this chapter, procedures for making many inferences from a single experimental data 
set were discussed. Such procedures are necessary to ensure that differences between 
treatments that are observed are due to real differences in the parameter functions being 
compared and not due to chance alone. Some procedures are more appropriate for planned 
comparisons, while other are more appropriate for data snooping. For studies with many 
variables, the F-test for equality of all means can be used to eliminate carrying out multiple 
comparisons on some variables. If the F-test is not significant at the desired a level, then 
one need not necessarily carry out further comparisons. But if the F-test is significant at the 
desired a level, then a multiple comparison method that adjusts for multiplicity should be 
used to compare the means (this is not Fisher's protected LSD). Finally, the multiple com¬ 
parison and testing procedures discussed can be applied to the unequal variance models 
(discussed in Chapter 2) by using the LSMeans statement in SAS-Mixed where the 
REPEATED statement is used to specify the unequal variance structure. Recommendations 
about which procedure to use in a given circumstance were also given. 


3.22 Exercises 

3.1 Use the data in Exercise 1.1 and carry out the following multiple comparisons. 

1) Carry out all pairwise comparisons among the five means using methods of 
Bonferroni, Scheffe, Tukey-Kramer, Sidak, simulate, and Fisher. 

2) Consider technique 4 as the control and use methods of Dunnett, Bonferroni, 
Sidak, Scheffe, multivariate f, and simulate to compare all other techinques 
to technique 4. 

3) Use the methods of Bonferroni, Bonferroni-Holm, Sidak, Sidak-Holm, simu¬ 
late and Scheffe to provide adjusted p-va lues with familywise error rate pro¬ 
tection for the following linear combinations of the means: 

Mi + M2 - Ms ~ Pa, Mi + M2 ~ Ms - Ms. (Mi + M2 + Ms)/3 - (Ms + Mr)/2 
(Mi + M2 + M3)/ 3 “ (M3 + Mr)/2/ (Mi + Mr + M5)/ 3 “ (M2 + M3 + M5)/ 3 
(Mi + Mr + Ms)/ 3 - (Ms + Mr)/2 

3.2 Use the data in Exercise 1.2 and carry out the following multiple comparisons. 

1) Use ration 1 as the control and compare the other four rations to the control using 
the methods of Dunnett, Bonferroni, Sidak, Scheffe, multivariate f, and simulate. 

2) Use the false discovery rate to make all pairwise comparisons. 

3.3 Use the data in Exercise 1.3 and carry out the following multiple comparisons. 

1) Provide simultaneous tests that control the FWER to 0.05 for the following 

hypotheses: 


H 0 : -2p, -p 2 - 0p 3 + Mr + 2 m 5 = 0 vs H a : (not H 0 ) 
H 0 : 2p, -p 2 - 2p 3 - p A + 2p s = 0 vs H a : (not H 0 ) 



70 


Analysis of Messy Data Volume 1: Designed Experiments 


H a : -lp 1 + 2p 2 + 0p 3 - 2p A + lp 5 = 0 vs H a : (not H 0 ) 

H 0 : lpt - ±p 2 + 6p 3 - 4p 4 + lp 5 = 0 vs H a : (not H 0 ). 

2) Use the methods of Bonferroni, Scheffe, Tukey-Kramer, Sidak, simulate, and 
Fisher to carry out all pairwise comparisons. 

3) Consider p t - p 2 , p x - p y p t - p u and p x - p 5 as primary comparisons and use 
the multivariate t to construct a confidence interval about the secondary 
comparison p x - (1/4) (p 2 + p 3 + p t + p 5 ). 

3.4 Use the data in Exercise 1.1 and construct simultaneous 95% confidence inter¬ 
vals about o 2 , Pi — p 2 , Pi — p 3 , Pi — Pu Pi — p 3 , p 2 ~ p 3 , Pi ~ Pi/ Pi ~ ffv M 3 — Mr/ 
Ms - Ms/ Mr - Ms- 

3.5 For the data in Exercise 2.3 use the unequal variance model and the methods of 
Bonferroni, Bonferroni-Holm, Sidak, Sidak-Holm, simulate, and Scheffe to 
provide adjusted p-values and simultaneous confidence intervals about the 
following five comparisons: 

1) The mean of the Blue Choc = the mean of the Red Choc. 

2) The mean of the Buttons = the mean of the means of the Blue Choc and Red 
Choc. 

3) The mean of the ChocChip = the mean of the WchocChip. 

4) The mean of the Small Choc = 1/2 the mean of the means of the Blue Choc 
and Red Choc. 

5) The mean of the Blue Choc and Red Choc = the mean of the ChocChip and 
WchocChip. 
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Properly designed and analyzed experiments provide the maximum amount of informa¬ 
tion about the conditions investigated for the resources used. This chapter presents con¬ 
cepts and methods for experimenters to use in designing and analyzing experiments. The 
basic concepts discussed in this chapter are treatment structure and design structure as well 
as the ideas of replication, blocking, and experimental unit. Examples of combining design and 
treatment structures are presented to demonstrate the concepts for complete block and 
incomplete block designs. These designs involve one size of experimental unit. Chapter 5 
describes the concept of the size of the experimental units and presents various designs 
involving more than one size of an experimental unit. The design structures presented in 
this chapter include the completely randomized (CRD), randomized complete block (RCBD), 
incomplete block (IBD) and Latin square (LSD). The treatment structures considered include 
the one-way, two-way, two-way with controls, fractional factorial, and n- way structures. 
In this chapter, the models and analysis of variance tables with necessary sources of varia¬ 
tion and degrees of freedom are presented. The discussion provides methods to determine 
the sources of variation used to compute the error sum of squares and algorithms to use to 
compute the resulting degrees of freedom. In general, the error sums of squares are 
obtained from comparisons of observations or linear combinations of observations that are 
treated alike. The computation of other sums of squares is discussed in later chapters. The 
basic approach in this chapter is to demonstrate the concepts with examples. The split-plot, 
repeated measures, strip-plot and crossover designs use the concept of different sizes of 
experimental units and are described in Chapter 5. Designs involving nesting are also 
discussed in Chapter 5. 


4.1 Introducing Basic Ideas 

Design of experiments is concerned with planning experiments in order to obtain the maxi¬ 
mum amount of information from the available resources. The design of an experiment 
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should start with a statement of objectives to be attained. The primary response variable or 
variables should be specified and a connection between the experimental objectives and 
the response variables should be fully described. The entities to be used as experimental 
units should be explained and related to the objectives of the experiment. Often, when 
designing an experiment, the experimenter has control over certain factors called treat¬ 
ments, populations, or treatment combinations. The experimenter generally controls the 
choice of the experimental units to be used and whether those experimental units can be 
put into groups, called blocks. A typical experiment involves t treatments (or treatment 
combinations) that are to be compared or whose effects are to be studied. 

Before an experiment can be carried out, several questions must be answered: 

1) What are the objectives of the experiment and what is (are) the response variable(s)? This 
is a very important aspect of the design as it is imperative to understand the 
process that is to generate the data and how the data relate to the objectives. 

2) Hozv many treatments are to be studied? The number of treatments may already be 
specified, but sometimes a discussion as to how the choice of the treatments relates 
to the objectives of the study is in order. For example, a nutrition student wanted to 
design a study to determine the maximum amount of soy flour that can be used in 
place of wheat flour in a cookie product so that there is no soy flavor in the resulting 
cookies. The student selected cookie formulations that involved replacing 0,10, 20, 

30,40,50,60,70,80,90, and 100% of the wheat flour with soy flour. The first question 
asked was, "If you use 100% soy flour can you taste the soy flavor?" For if you cannot 
taste the soy flavor with 100% soy flour, then there is no need to run the experiment, 
and if you can taste the soy flavor there is no need to include 100% soy flour in the 
set of treatments. After discussing the set of treatments, it was decided that it was 
unknown if soy flavor could be tasted with products made from 20,30,40, and 50% 
soy flour. That is, it was determined that one could not taste soy flavor with 10% soy 
flour and one could taste the soy flavor with more than 50% soy flour. After relating 
the selection of the treatments to the objectives of the study, only five (0, 20, 30, 40, 
and 50% soy flour) of the initial 11 treatments were needed. The 0% was included as 
the wheat flour control. This process greatly reduced the number of samples needed 
for the study. 

3) Hozv many times does each treatment need to be observed? This question relates to the 
sample sizes needed to achieve the specific objectives. 

4) What are the experimental units? A lot of experiments involve only one size of exper¬ 
imental unit. But the important idea involved is that of an independent replication. 
Some experiments involve more than one size of experimental unit, a concept 
described in Chapter 5. Often researchers think that only one size of experimental 
unit is involved in an experiment and fail to recognize situations where more than 
one size of an experimental unit is involved. This question needs to be carefully 
addressed, as discussed in Chapter 5. 

5) Hozv does the experimenter apply the treatments to the available experimental units and 
then observe the responses? This question relates to the use of randomization to 
assign treatments to experimental units as well as the use of randomization in 
other parts of the process providing the data. It is important that randomization 
be used to assign treatments to experimental units. It is also important to run 
samples through the laboratory in a random order or to use a random order to 
evaluate subjects in a medical study. 



Basics for Designing Experiments 


73 


6 ) Can the resulting design be analyzed or can the desired comparisons be made? This is pos¬ 
sibly the most important question for the researcher, but a major goal of this book 
is to enable the reader to use more complicated designs and still be able to carry 
out the analyses to estimate important parameters and test desired hypotheses. 

The answers to these questions are not necessarily straightforward and the questions 
cannot be answered in a general way. Hopefully, the ideas and concepts discussed here will 
help the experimenter put together enough information to provide answers for their study. 

To continue, consider an experiment involving t treatments in which each treatment is 
applied to r different experimental units. A mathematical model that can be used to describe 
yijr the response observed from the /th experimental unit assigned to the zth treatment, is 

3/i;- + £ ;/ forz = 1,2,...,f, and / = l,2,...,r (4.1) 

where p, is the true, but unknown, mean of the responses to the zth treatment and £, ( is a ran¬ 
dom variable representing the noise resulting from natural variation and other possible 
sources of random and nonrandom error. Researchers should do their best to control nonran¬ 
dom sources of error, which include model error, measurement error, observational error, 
and a misspecification of treatments errors. In order to conduct this experiment, the researcher 
must select rt experimental units and then randomly assign r of the experimental units to 
each treatment. The randomization part of this process is very important in preventing bias from 
entering into the treatment assignments. Just by the fact that experimental units are randomly 
assigned to treatments, a randomization or permutation analysis can be used to develop the 
theory for an appropriate analysis (Kempthome, 1952). The very least that can be said about 
the use of randomization is that it prevents the introduction of systematic bias into the experi¬ 
ment. If the experimenter does not use randomization, then she cannot tell whether an 
observed difference is due to differences in response of the experimental units to the treat¬ 
ments or due to the systematic method used to assign the experimental units to the treatments. 

The statistical objective of an experiment is to compare the observed response of treatments 
on experimental units. For example, if the researcher wants to compare the effect of a hyper¬ 
tension compound on human blood pressures, then using white mice as experimental 
units in the study will not enable one to make inferences to humans. Inferences can only be 
made to the population of experimental units from which those used in the study are a 
representative sample. It is very important to characterize the population of experimental 
units to which one wishes to make inferences. The sample of experimental units used in 
the study must be randomly selected from the population to make inferences to that popu¬ 
lation. Often it is not possible to carry out a random selection of experimental units from a 
population of experimental units to be included in study. But at a minimum, the sample of 
experimental units must be representative of the population or a conceptual population. 
For example, when sampling a production line, one hopes the items selected for the study 
will be representative of the yet to be produced population of items. If the production pro¬ 
cess is not changed, then it is reasonable to assume that the selected items will be represen¬ 
tative of future items. It is very important to describe the population of experimental units 
represented by the experimental units used in the experiment. If the experimental units are 
not representative of the population of experimental units to which inferences are to be 
made, inferences cannot be made, but instead, one can make conjectures as to the effect of 
the treatments to an unsampled population. For example, information about hypertension 
compounds acquired from a study involving white mice can be used to make a conjecture 
(not make an inference) about the effects of the hypertension compounds on humans. 
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When selecting experimental units for a study, one obtains better comparisons between 
treatments when the set of experimental units used in the study is homogeneous or 
very nearly alike. In many experiments, it is impossible to select rt identical experimen¬ 
tal units. The nonidentical experimental units contribute to the noise in the data through 
the e :j . When experimental units are not homogeneous or not alike, there are three meth¬ 
ods than can be used to help account for variation among the experimental units. One 
method is to measure characteristics that describe differences among the experimental 
units, such as weight and age, and use analysis of covariance as described in Milliken 
and Johnson (2001). A second method is to group experimental units into sets of nearly 
alike experimental units. Experimental units that are nearly alike are called homoge¬ 
neous. When this is the case, the treatments can be compared on the similar experimental 
units within a group where the group to group variation can be accounted for in the 
analysis. Groups of similar experimental units are called blocks. A third method of 
accounting for variability among experimental units is to use both blocking and covari¬ 
ate analysis. Let there be r blocks each with t experimental units where each treatment 
occurs once in each block. A model that represents the observed response of the zth treat¬ 
ment in the /th block is 

y tj = pi+ bj + e* for i = 1,2,...,f, and ;' = l,2,...,r (4.2) 

For model (4.2), the £, ( s in model (4.1) have been replaced by £,, = b ; + £*; that is, the variation 
between groups or blocks of experimental units has been identified and isolated from £*, 
which represents the variability of experimental units within a block. By isolating the 
block effect from the experimental units, the within-block variation can be used to compare 
treatment effects, which involves computing the estimated standard errors of contrasts of 
the treatments. 

Two treatments (or any contrast of treatments) can be compared, free of block effects, by 
taking within-block differences of the responses of the two treatments as 

Vi] ~ Hi) = (P, + bj + £*)- (Pi + bj + E*j) 

= P, ~ Pi + £ * ~ £,* 

which does not depend on the block effect bj. The result of this difference is that the variance 
of the difference of two treatment responses within a block depends on the within-block 
variation among the experimental units and not the between-block variation. 

An objective of experimental design is to select and group the experimental material so 
that the noise or experimental error amongst the experimental units within groups in the 
experiment is reduced as much as possible. Thus, the experimental units on which the 
treatments are to be compared should be as much alike as possible so that a smaller differ¬ 
ence between two treatments can be detected as a significant difference. 

If there are t treatments and t experimental units, an experiment can be conducted and 
the mean of each treatment can be estimated from the observations. But an estimate of the 
error variance cannot be obtained. An estimate of the error variance (associated with £,y or 
e*) can be obtained only when some or all of the treatments are replicated. A replication of 
a treatment is an independent observation of the treatment. Thus two replications of a 
treatment must involve two experimental units. An experimental unit is the entity to 
which the treatment has been applied. But the replications of the treatments must involve 
processes so that the treatment is applied and observed independently on each experimen¬ 
tal unit. Therefore, suppose a researcher wants to compare the effect of two diets on the 
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growth rate of rabbits and he has 10 rabbits to use in the study. The process would be to 
randomly assign five rabbits to each of the treatments. But what if he puts the five rabbits 
assigned to one of the diets in one cage and the five rabbits assigned to the second diet in 
another cage where the rabbits within a cage are fed the diet from a common bowl. The 
individual rabbit was randomly assigned to the diet, but since all five rabbits are put into 
one cage and fed from a common bowl, the rabbits are not observed independently of each 
other. In this case, the cage of four rabbits becomes the experimental unit instead of the 
individual rabbit. The cages are treated independently of each other, but the rabbits within 
a cage are not treated independently. Thus the rabbits within the cage do not provide inde¬ 
pendent replications of the diet. It is very important to understand the complete process 
involved in carrying out a study in order to be able to see when independent replications 
occur and when they do not occur. It is imperative that this definition be observed to 
determine when replications are utilized during an experiment. Too often researchers 
use duplicate or split samples to generate two observations and call them replicates, when, 
in reality, they are actually sub-samples or repeated measures. Duplicates certainly do not 
provide the same information as independent replicates. 

Suppose a researcher wanted to study the differences in the heights of male and female 
students at a specified university. Assume there are 22,000 students in the university. 
A process could be to randomly select 100 female students and 100 male students from the 
population of students at the university. The next step would be to find all of the students 
in the random sample and measure their heights, thus producing 100 measurements of 
heights of females and 100 measurements of heights of males. This seems like a lot of work 
to find all 200 students. Suppose, instead, that the researcher selects one female and one 
male and measures the height of each 100 times. This second process produces 100 mea¬ 
surements of heights of females (one female in this case) and 100 measurements of heights 
of males (also one male in this case). There are 200 observations in each of the data sets, but 
the variability among the 100 measurements within a set from the first process provides 
a measure of the variance among female students and among male students at the univer¬ 
sity. The variability of the 100 measurements within a set in the second case provides a 
measure of variance among the 100 measurements made on the same person. This second 
case provides information about the measurement process variability (repeated measure¬ 
ments of the same person), but not information about the variability among heights of 
females or of males at the specified university. The independent measurements of the 
height of one person do not provide a measure of the true variation in the heights of the 
population of people. The measurements of the heights of 100 females provide 100 replica¬ 
tions where the 100 measurements of a single person provide repeated measurements on 
that person. These 100 measurements on the same person are called repeated measurements 
or sub-samples, but not replications. 

A baker ran an experiment to compare the abilities of three preservatives to inhibit mold 
growth in a certain type of cake product. The baker mixed and baked one cake with each 
preservative. The number of mold spores per cubic centimeter of cake is measured after 
nine days of storage. The baker wanted 10 replications for the analysis so he split each cake 
into 10 slices and obtained the spore count on each slice. However, those 10 measurements 
did not result from 10 independent applications of the preservative. The variation mea¬ 
sured by his sub-samples is an index of the within-cake variation and not an index of the 
experimental-unit-to-experimental-unit or cake-to-cake within a preservative variation. 
To obtain 10 replications of each preservative, the baker needs to bake 10 cakes with each 
preservative. These cakes need to be mixed and baked independently of each other. It 
might be easier to mix up one large batch of cake dough, mix in the preservative and then 
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pour the mixture into 10 cake pans. This process provides 10 cakes, but the cakes are not 
mixed independently of one another, so they are not independent replications. The baker 
needs to mix 10 batches of dough with preservative and then bake one cake from each 
batch in different ovens (or at different times) to obtain 10 replications of a preservative. 

Another example of nonreplication involves what some researchers call a strip trial. In 
agronomy, the strip trial consists of planting all the seed of a given variety of plant in one 
row (or group of rows) and each row (or group of rows) is planted with a different variety. 
The rows (or groups of rows) are then partitioned into, say, eight parts, and the parts are 
called "replications." The diagram in Figure 4.1 represents six strip plots of four rows each 
where the strip plots are split into eight parts denoted as replications. The advantage of 
using a strip trial instead of eight independent replications is that the researcher need not 
continually change the seed from one planter box position to another planter box position 
in the strip trial and the planting plan is very simple. Now, if she wants to run a well- 
designed experiment, she would have to change the planter boxes eight times, as dictated 
by a specific randomization scheme. Doing such a randomization would provide eight 
blocks in a randomized complete block design structure. If the experimenter analyzes the 
strip trial data as she would a randomized complete block design, her analysis will be 
incorrect. In fact, the strip trial experiment cannot be used to make inferences about vari¬ 
ety differences since there is only one independent observation of each variety. The four 
rows or strip plot is the experimental unit to which the variety is applied. In the strip trial, 
the researcher could have just as easily partitioned the strip plots into 20 parts or 100 parts; 
after all, with more "replications," one can detect smaller differences between two means 
as being significant. But these measurements made on each of these strips are not true 
replications; instead they are subsamples or repeated measurements of the strip plots. 
Thus, obtaining more and more parts does not aid in detecting differences between the 
means. One test for determining whether a part or an observation is a true replication is 
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FIGURE 4.1 Schematic of a strip plot design with four row plots and six varieties arranged in eight pseudo¬ 
replications. 
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the following: If the researcher could have just as easily obtained more "replications" by 
splitting, then she is not obtaining true replications, but is obtaining subsamples or 
repeated measures. It is very important to distinguish between a subsample and a replication 
since the error variance estimated from between subsamples is in general considerably 
smaller than the error variance estimated from replications or between experimental units. 
The values of F-statistics constructed using the error variance computed from subsamples 
will be much larger than they should be, leading the experimenter to determine more 
differences as being statistically significant than she should. 


4.2 Structures of a Designed Experiment 

A designed experiment consists of two basic structures and it is vital to be able to identify 
and distinguish between each structure. Before an appropriate model and analysis can be 
constructed for a specific design, all of the factors used in a design of an experiment must 
be classified as belonging to either the treatment structure or the design structure. The 
following definitions and discussion are used to help delineate the differences between 
the two structures. 

Definition 4.1: The treatment structure of a designed experiment consists of the set of 
treatments, factors, treatment combinations, or populations that the experimenter has 
selected to study and/or compare. 

The treatment structure is constructed from those factors or treatments to be compared 
as measured by their effect on given response variables. The factors in the treatment struc¬ 
ture must be selected to address the stated objectives of the experiment. The treatment 
structure could be a set of treatments, called a one-way treatment structure, or a set of 
treatment combinations, such as a two-way factorial arrangement or a higher-order facto¬ 
rial arrangement, plus any controls or standard treatments. 

Definition 4.2: The design structure of a designed experiment consists of the grouping of 
the experimental units into homogeneous groups or blocks. 

The design structure of a designed experiment involves the factors used to form groups 
of experimental units so that the conditions under which the treatments are observed are 
as uniform as possible. If all the experimental units are very homogeneous, then there 
need only be one group or block of observations and the experimental units can be assigned 
to the treatments completely at random. Such a design structure is called a completely 
randomized design structure. 

If more than one group of experimental units is required so that the units within a group 
are much more homogeneous than experimental units between groups, then the design 
structure involves some type of a blocked design. There are several factors that can be used 
to form blocks of experimental units, but, as is discussed below, the factor or factors used 
to construct blocks must not interact with the factors in the treatment structure. Once the 
treatment structure and design structure have been selected, the designed experiment is 
specified by describing exactly the method of randomly assigning (randomizing) the treat¬ 
ments of the treatment structure to the experimental units in the design structure. 
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Total design of an experiment 



FIGURE 4.2 Graphical demonstration of the use of randomization to combine the treatment structure with the 
design structure to form the total design of an experiment. 


Thus, the total designed experiment involves 1) the choice of the treatment structure, 2) 
the choice or the design structure, and 3) the method of randomly assigning the treatments 
or treatment combinations in the treatment structure to the experimental units in the 
design structure. Figure 4.2 represents the two parts of a designed experiment. 

The total designed experiment dictates the appropriate model to be used to obtain an 
appropriate analysis. In constructing the model to describe the design, two basic assump¬ 
tions are made about design and treatment structures. First, it is assumed that the compo¬ 
nents of the design structure are random effects, that is, the blocks used in the design 
are a random sample from the population of possible blocks of experimental units. This 
assumption implies there is a population of blocks of experimental units to which the 
researchers wish to make inferences. Second, it is assumed that there is no interaction 
between the components or factors of the design structure and the components or factors 
of the treatment structure. In other words it is assumed that the relationships existing 
between the treatments will be consistent from block to block (except for random varia¬ 
tion) or, stated another way, that the blocking factors will not influence the relationship 
between the treatments. Many text books describe blocking factors as nuisance factors 
and do not address the possibility of the nuisance factors interacting with the factors 
in the treatment structure (Cobb, 1997). If such interactions can occur, those nuisance 
factors must either be included in the treatment structure or the values of the nuisance 
factors could be considered as covariates with the possibility of unequal slopes (Milliken 
and Johnson, 2001). The selection of factors for possible use of constructing blocks 
must be carefully evaluated to prevent probable interaction with factors in the treatment 
structure. 

The design structure is selected by using all available knowledge of the experimental 
units and is chosen independently of the treatment structure (do not let the treatment struc¬ 
ture influence the selection of a poor design structure). Likewise, the experimenter should 
select the treatment structure without any knowledge of the design structure (do not let the 
design structure hamper the selection of the necessary set of treatments). After the appropri¬ 
ate design structure is specified and the desired treatment structure selected, some compro¬ 
mises may be needed in either one or both of the structures to make them compatible with 
one another and to enable the experimenter to conduct an effective experiment. 

4.2.1 Types of Design Structures 

The design structure is determined by the type of blocking or grouping of the experimen¬ 
tal units into homogeneous groups and is specified by the factors used to form the blocks. 
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The following are descriptions of some common design structures. There are two basic 
design structures which are described as complete block and incomplete block design 
structures. Several specific design structures are examined in more detail in Section 4.3. 

1) Completely randomized design structure. In a completely randomized design struc¬ 
ture, all experimental units are assumed to be homogeneous and the experimental 
units are assigned to the treatments completely at random. Generally, the treat¬ 
ments are assigned to an equal number of experimental units, although this is not 
required. This design structure may also be used when the experimental units are 
not homogeneous and the experimenter cannot find factors that allow them to be 
grouped into more homogeneous groups. Analysis of covariance could be used 
where the values of the blocking factors are used as covariates instead of using 
them to form blocks (Milliken and Johnson, 2001). 

2) Randomized complete block design. If there are t treatments, then the randomized 
complete block design structure consists of having blocks of experimental units 
with t or more experimental units in each block. With the block size equal to or 
greater than the number of treatments, there is the possibility of having a complete 
set of treatments occurring within each block, thus the name, randomized complete 
block. If the block size is exactly equal to t for each block, then each treatment is 
randomly assigned to exactly one experimental unit within each block. If there are 
more than t experimental units within each block, then each treatment can be 
assigned to one experimental unit and some treatments (maybe all) can be assigned 
to more than one experimental unit, lit = 5 and the blocks are of size 8, then each 
treatment will be assigned to experimental units within each block and three of the 
treatments will be assigned to one additional experimental unit for a total of two 
replications of those treatments. One strategy would be to set out a pattern of treat¬ 
ments to blocks where the numbers of observations per treatment are as balanced 
as possible. Figure 4.3 consists of one way to assign five treatments to three blocks 
of size 8. Within each block, randomly assign the respective treatments to the 
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FIGURE 4.3 Randomized complete block design structure with five treatments in blocks of size 8. 
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experimental units within that block. The arrangements shown are one set of the 
possible randomizations. If each block consists of ext experimental units where c 
is an integer, then each treatment can be assigned to c experimental units within 
each block. This is also a randomized complete block design structure. A random¬ 
ized complete block design structure is any blocking scheme in which the number 
of experimental units within a block is greater than or equal to the number of treat¬ 
ments, and thus a complete set of treatments can be assigned to experimental units 
in each block. 

3) Latin square design. The Latin square design structure consists of blocking in two 
directions. For an experiment involving t treatments, t 2 experimental units are 
arranged into a t x t square where the rows are called rozv blocks and the columns 
are called column blocks. Thus the txt arrangement of experimental units is blocked 
in two directions (row blocks and column blocks). To construct a Latin square 
design structure, the treatments are randomly assigned to experimental units in 
the square such that each treatment occurs once and only once in each row block 
and once and only once in each column block. See Cochran and Cox (1957) for 
various arrangements of treatments into row and column blocks. Blocking in two 
or more directions is common in many disciplines, in particular, blocking by rows 
and columns is useful when the experimental units occur in a rectangle (one of the 
dimensions could be time). Graeco-Latin squares can be used to form blocks in 
three directions (Cochran and Cox, 1957). 

4) Incomplete block designs. Incomplete block designs occur when the number of treat¬ 
ments exceeds the number of experimental units in one or more blocks. When this 
occurs, then a complete set of treatments cannot occur within each block, hence 
the name "incomplete block." There are several special incomplete block design 
structures such as balanced incomplete blocks and partially balanced incomplete 
blocks. A balanced incomplete block design structure is one where the assignment 
of treatments to blocks is such that every pair of treatments appears in the same 
block an equal number of times. A partially balanced incomplete block design 
structure occurs when sets of treatment occur together within blocks an equal 
number of times and other treatments occur a different number of times together 
within some blocks. The split-plot design structure, discussed in Chapter 5, is an 
example of a partially incomplete block design structure. Some incomplete block 
design structures are described in Example 4.5. 

5) Various combinations and generalizations. There are various ways to group the exper¬ 
imental units. Sometimes a grouping does not satisfy the above definitions but 
still provides a valid design structure. An example is where the block sizes vary 
from block to block where some blocks are incomplete while others are complete. 

In any case, these other blocking schemes can provide an experimenter with very 
viable design structures with which to conduct effective experiments. 

4.2.2 Types of Treatment Structures 

The treatment structure consists of the various treatments or treatment combinations or 
factors and factor combinations that the experimenter wishes to study. The components of 
the treatment structure should be selected so as to relate to the objectives of the experi¬ 
ment which should be specified in the protocol or description of the study. Next, some 
common types of treatment structures are described, each of which are examined in more 
detail in Section 4.3. 



Basics for Designing Experiments 


81 


1) One-way treatment structure. The one-way treatment structure consists of a set of 
t treatments or populations where there is no assumed structure among the treat¬ 
ments. There can be a relationship among the treatments such as using four tem¬ 
peratures, 120,130,150, and 160°C. If the treatments are constructed by combining 
two or more factors, the factors are not used in the representation of the treatments 
in the model. The one-way treatment structure can be used to represent any set of 
treatments and is often used to represent factorial treatment structures when some 
of the possible treatment combinations are missing. This approach is used in 
Chapter 13. 

2) Tzvo-zvay treatment structure. A two-way treatment structure consists of the set of 
treatments constructed by combining the levels or possibilities of two different 
factors. The resulting set of treatments, called treatment combinations, is gener¬ 
ated by combining each possibility for one of the factors with each possibility 
for the other factor. If the first factor has s possibilities and the second factor has r 
possibilities, the combination produces sr treatment combinations. Figure 4.4 
presents an example of a two-way treatment structure where factor A has three 
possibilities, factor B has four possibilities, and the crossing generates 12 treat¬ 
ment combinations. 

3) Factorial arrangement treatment structure. A factorial arrangement treatment struc¬ 
ture consists of the set of treatment combinations constructed by combining the 
levels of two or more factors. The two-way treatment structure is a two-way 
factorial arrangement. Three-way on up to an n- way treatment structures are also 
factorial arrangements. An n- way treatment structure is generated by combining 
the possibilities for n factors, where the factors have s 1 ,s 2 ,...,s n possibilities, 
respectively, which generates s x x s 2 x • • • x s„ treatment combinations. Examples of 
factorial arrangement treatment structures are scattered throughout the text. 

4) Fractional factorial arrangement treatment structure. A fractional factorial arrange¬ 
ment treatment structure consists of only a part, or fraction, of the possible treat¬ 
ment combinations in a factorial arrangement treatment structure. There are many 
systematic techniques for selecting an appropriate fraction, most of which depend 
on the assumptions the experimenter makes about interactions among the various 
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types of factors in the treatment structure. An experiment may involve eight differ¬ 
ent factors, each at two levels for a total of 2 8 treatment combinations. The researcher 
may want to look for important main effects and two-factor interactions. In that 
case, a one-fourth fraction or 64 treatment combinations could be used in the study. 
Such a design is often denoted as a 2 8-4 fractional factorial arrangement. See 
Milliken and Johnson (1989) for more details. A Latin square arrangement treat¬ 
ment structure involves a three-way factorial arrangement with n row treatments, 
n column treatments, and n cell treatments. The Latin square arrangement consists 
of n 2 of the n 3 possible treatment combinations or is a 1 /nth = (n 2 / « 3 )th fraction of 
the n 3 possible treatment combinations. One possible use of the Latin square 
arrangement is when it can be assumed there are no two-way or three-way inter¬ 
actions among the three factors. 

5) Optimal design treatment structures. For many experimental situations all of the fac¬ 
tors in the treatment structure are quantitative and the objective of the study is to 
collect data such that a particular linear or nonlinear model can be fit. The result¬ 
ing treatment combinations selected using one of the design criteria (St. John and 
Draper, 1975) form an optimal design. This set of treatment combinations is called 
an optimal design treatment structure. 

6) Factorial arrangement with one or more controls. The desired treatment structure used 
to satisfy the goals of the experiment can include combining more than one treat¬ 
ment structure. For example, a treatment structure for an experiment could con¬ 
sist of combining a one-way treatment structure of c controls with a two-way 
factorial arrangement treatment structure. Figure 4.5 contains one such treatment 
structure where the factorial arrangement consists of two levels of factor A and 
three levels of factor B combined with three controls. 

All of the above treatment structures can always be considered as a one-way treatment 
structure for analysis purposes. In particular, when the treatment structure is a complex 
combination of two or more treatment structures, as in Figure 4.5, it is usually best 
to consider the set of treatments as a one-way treatment structure when analyzing the 
resulting data. 
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Split-plot and repeated measures design structures are constructed from incomplete 
block design structures and factorial arrangement treatment structures involving two or 
more factors. In effect, the combination of the design structure and the treatment structure 
for the split-plot and repeated measures designs generates different sizes of experimental 
units, a topic that must be addressed in order to obtain an appropriate analysis. Such 
designs are discussed in Chapter 5. 


4.3 Examples of Different Designed Experiments 

There is a vast amount of published information about various types of designs used for 
many types of experiments, for example, see Cochran and Cox (1957), Davies (1954), 
Federer (1955), Hicks (1993), John (1971), Kirk (1968), Cornell (1990), Anderson and McLean 
(1974), Box et al. (1978), Cobb (1997), Kempthome (1952), Laundsby and Weese (1993), 
Lentner and Bishop (1986), Meed (1988), Montgomery (1991), and Winer (1971). This sec¬ 
tion contains several examples that demonstrate the design structures and treatment struc¬ 
tures described in Section 4.2. Hopefully, this discussion will help readers to apply these 
concepts to their own experiments. In most examples, the resulting designed experiment is 
named by specifying the type of design structure and the type of treatment structure. For 
example, a designed experiment could consist of a two-way treatment structure in a ran¬ 
domized complete block design structure. This method of describing a designed experi¬ 
ment differs from that generally used in the literature, but the authors think using the 
design and treatment structures is the best way to identify a designed experiment. In 
addition, this description also helps one to construct a suitable model and develop the 
appropriate analysis. For each experimental situation, the design structure and the treat¬ 
ment structure are specified and the corresponding model and the resulting analysis of 
variance table with the sources of variation and corresponding degrees of freedom are 
given. The formulas for computing sums of squares are not given in this section, but some 
examples with computations are included in other chapters. The emphasis of this chapter 
is determining the way the error sum of squares is computed and establishing the corre¬ 
sponding degrees of freedom. 

4.3.1 Example 4.1: Diets 

A nutritionist wants to study the effect of five diets on losing weight. The treatment struc¬ 
ture of this experiment is a one-way classification involving a single factor, called diet, 
with five levels or five treatments. Many different design structures can be used to evalu¬ 
ate the diets. If there are 20 homogeneous people, then a completely randomized design 
structure can be used where each diet is randomly assigned to four people. One model for 
a one-way treatment structure in a completely randomized design structure is 

Vij = + £ ij z = l,2,...,f, ;' = 1,2,.(4.3) 

where p, denotes the mean of the zth treatment (diet) and e I( denotes the random error. The 
analysis of variance table for model (4.3), assuming the ideal conditions that £, ( - i.i.d. N( 0, a 2 ), 
is given in Table 4.1. Table 4.1 contains the different sources of variation and their respective 
degrees of freedom for the model in Equation 4.3. The 15 degrees of freedom for experimental 
error are obtained from the variation among persons treated alike. There are four persons 
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TABLE 4.1 


Analysis of Variance Table for a One-Way Treatment 
Structure in a Completely Randomized Design Structure 


Source of Variation 

df 

Diet 

4 

Error 

15 


given diet 1, providing four persons treated alike, and the variation among these four persons' 
weight loss values provides three degrees of freedom for error. There are four persons treated 
alike for each diet. Thus, there are three degrees of freedom available for error from the data 
for each of the five diets. If the variances are equal, then these five sets of three degrees of 
freedom can be pooled into an error term involving 15 degrees of freedom. The methods in 
Chapter 2 can be used to evaluate the plausibility of the equal variance assumption. 

Assume the researcher could enroll and interview just five persons during a given time 
period. Let time period be a blocking factor and randomly assign the five diets to one per¬ 
son within each time period. The design is a one-way treatment structure in a randomized 
complete block design structure with four blocks of size five. A model that can be used to 
describe data from this design is 

Vij = i Ut + bj + £ t j i = 1,2,3,4,5, j = 1,2,3,4 (4.4) 

where p, denotes the mean of the zth treatment (diet), b denotes the effect of the /th block 
and £,y denotes the random error associated with the person assigned to the z'th treatment in 
the j th block. The analysis of variance table for model (4.4) is displayed in Table 4.2. The 
error degrees of freedom are computed by using four orthogonal contrasts of the treatments 
within each block, such as. 


c hj = yy-y2j j= 1,2, 3, 4 

<hj = 3/i/ + 3 / 2 /- 2y 3; ' j = 1,2, 3,4 

= y y + y 2; + y 3 , - 3y 4/ /' = l, 2,3 , 4 

‘ly=y y +y 2 y + y 3 / +y 4; - 4 y 5; j = 1,2,3,4 

The values for the same value of z all have the same mean, indicating they are all treated 
alike. Thus the variance of each set of four values of q ,y provides three degrees of freedom 


TABLE 4.2 


Analysis of Variance Table for a One-Way Treatment Structure 
in a Randomized Complete Block Design Structure with One 
Replication per Treatment in Each Block 


Source of Variation 

df 

Block (date) 

3 

Diet 

4 

Error 

12 
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for error. Thus there are four sets of three degrees of freedom that can be pooled (if the 
variances are equal) into the error sum of squares with 12 degrees of freedom. However, 
the variances of the q^ are not equal and must be rescaled before pooling. The variances are 
Var (q t j) = 2a 2 , Var (q 2 j) = 6a 2 , Var (q 3 j) = 12 a 2 , and Var(^y) = 20c 2 for j = 1, 2, 3, 4, so the vari¬ 
ances of the qjj need to be divided by the coefficient of a 2 before pooling. In effect, these 
pooled sums of squares provide the block by diet interaction sum of squares. Therefore, a 
second way of obtaining the variance of things treated alike is to use the block-by-treat- 
ment interaction to obtain the experimental error estimate for a randomized complete 
block design structure. 

Next, suppose there are not 20 homogeneous persons available, but there are 10 homoge¬ 
neous males and 10 homogeneous females. One strategy would be to use sex of person as 
a blocking factor where there are two blocks of size 10. A randomized complete block 
design structure could be used where each diet would be randomly assigned to two males 
and two females, so there are two replications of each diet within each block. The model 
for a one-way treatment structure in a randomized complete block design structure is 

y ijk = Hi + bj + % i = 1,2,3,4,5, / = 1,2, k = 1,2 (4.5) 

where p, denotes the mean of the zth treatment (diet) effect, b f denotes the effect of the /th 
block and denotes the random error associated with the /cth person assigned the zth 
treatment in the /th block. The analysis of variance table for model (4.5) is displayed in 
Table 4.3. There is one degree of freedom associated with the design structure or sex of 
person that has been removed from the error term of Table 4.1. The error term for a one¬ 
way treatment structure in a randomized complete block design structure where there is 
one observation per treatment in each block is computed from the block by treatment 
interaction. This design involves blocking and multiple observations per treatment in each 
block. The number of degrees of freedom for the block by treatment interaction is equal to 
(2 - 1) x (5 - 1) or four degrees of freedom. The variability of the two observations of each 
diet within each block provides one degree of freedom. Thus there are five degrees of free¬ 
dom for error from the comparisons of persons treated alike within each block, or 10 
degrees of freedom pooled across the two blocks. Pooling the block by treatment inter¬ 
action with the within block variability provides 14 degrees of freedom for error. 

In most cases sex of person is not a good choice for a blocking factor since the treatments 
(diets in this case) might interact with sex of person. If the factor (or factors) selected 
to construct blocks can possibly interact with the treatments, then that factor needs to be 
included in the treatment structure either as a stratification factor or in an analysis of 
covariance with possibly unequal slopes to carry out an appropriate analysis (Milliken 
and Johnson, 2001). In this example, sex of the person must be combined with the five diets 


TABLE 4.3 


Analysis of Variance Table for a One-Way Treatment Structure 
in a Randomized Complete Block Design Structure with Two 
Replications per Treatment in Each Block 


Source of Variation 

df 

Block (sex of person) 

1 

Diet 

4 

Error 

14 
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to form a two-way factorial arrangement or a two-way treatment structure. This two-way 
treatment structure consists of the 10 treatment combinations generated by combining the 
two levels of sex of person with the five levels of diet. By switching sex of the person from 
the design structure to the treatment structure, the resulting design structure is a com¬ 
pletely randomized design with two replications of each treatment combination. The ran¬ 
domization scheme is to randomly assign each diet to two males and to two females. This 
is the same randomization scheme as for model (4.5), but with different treatment and 
design structures. One model for a two-way treatment structure in a completely random¬ 
ized design structure is the means model: 

yijk = Vij + £ijk 1 = 1,2,...,5, 7 = 1,2, fc = l,2 (4.6) 

where p,j denotes the mean of the //'t h treatment combination (sex of person by diet) and e ijk 
denotes the random error. Sometimes the mean p lt is expressed as an effects model: 

H,, = H+E + P, + 7,j 

where p is the overall mean, r, is the effect of the zth diet, /j ; is the effect of the /th sex of 
person and y,j is the interaction effect. The analysis of variance tables for model (4.6) for 
both expressions of p tj are given in Table 4.4. 

Next, suppose that the diets have a structure consisting of a control diet and four diets 
made up of the four combinations of two protein levels and two carbohydrate levels, as 
shown in Figure 4.6. The diet treatment structure is a two-way factorial arrangement with 
a control that, when crossed with sex of person, generates a three-way treatment structure 
(protein x carbohydrate x sex) with two controls where there is one control for males and 
one control for females. The design structure is completely randomized where each treat¬ 
ment combination is to be assigned to two persons. A model that can be used to describe 
this data is 


y,)k = Hij + % i = 0, 1 2 , 3 , 4, 7 = 1,2, k = t ,2 (4.7) 

where p 01 and p 02 denote the means of the controls and the p u , i = 1,2,3,4 and; = 1,2 denote 
the means of the diet by sex of person treatment combinations. The analysis of variance 


TABLE 4.4 

Analysis of Variance Table for a Two-Way Treatment Structure 
in a Completely Randomized Design Structure for Both Means 
and Effects Models 


Source of Variation 

df 

Hij model 

Sex x diet 

9 

Error 

10 

U + T, x x yq model 

Sex 

1 

Diet 

4 

Sex x diet 

4 

Error 

10 
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Carbohydrate 

2 


Protein 


P1C1 

P1C2 


P2C1 

P2C2 



Control 


FIGURE 4.6 Structure of diets with a two-way factorial arrangement plus one control for the treatment structure. 


table for model (4.7) is in Table 4.5, where "Control vs 2 2 " denotes a comparison between the 
control diet and the average of the four protein by carbohydrate treatment combinations. 
The complete analysis would most likely be accomplished using the two-way treatment 
structure of diet by sex of person with appropriate contrasts of the treatment means to 
provide the comparisons involving protein, carbohydrate, sex of person, and the controls. 
The construction of such contrasts is discussed in Chapters 6 and beyond. 


TABLE 4.5 


Analysis of Variance Table for a Treatment Structure Consisting 
of Three-Way Factorial Arrangement Combined with Two 
Controls in a Completely Randomized Design Structure 


Source of Variation 

df 

Sex 

1 

Diet 

4 

Control vs 2 2 

1 

Protein 

1 

Carbohydrate 

1 

Protein x carbohydrate 

1 

Sex x Diet 

4 

Sex x control vs 2 2 

1 

Sex x protein 

1 

Sex x carbohydrate 

1 

Sex x protein x carbohydrate 

1 

Error 

10 
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4.3.2 Example 4.2: House Paint 

A paint company wants to compare the abilities of four white house paints to withstand 
environmental conditions. Four square houses, each with one side facing exactly north, 
were available for the experiment, thus houses can be used as a blocking factor. Each side 
of a house is possibly exposed to different types of weather, thus the sides (indicated here 
by directions north, south, east, and west) of the houses can also be used as a blocking 
factor. Since the number of treatments (the four paints) was the same as the number of 
levels of both blocking factors, a Latin square design structure can be used to study the 
paints. Flere the paints can be assigned to sides of houses where each paint can occur once 
and only once on each house and once and only once in each direction. There are three 
basic Latin square arrangements (see Cochran and Cox, 1957). The randomization process 
is to randomly select one of the three possible arrangements, then randomly assign the 
rows to the houses, randomly assign the directions to the columns and randomly assign 
the types of paint to the letters in the square. One such arrangement of assigning paints to 
houses and directions is shown in Table 4.6. 

The design of the experiment is a one-way treatment structure in a Latin square design 
structure. A model that can be used to describe data from this experiment is 

y ijk = p, + hj + d k + e ijk i = 1,2,3,4, j = 1,2,3,4, k = 1,2,3,4 (4.8) 

where q, denotes the mean wearability score for the / th paint, //, denotes the effect of the 
;'th house, d k denotes the effect of the kth direction, and e ijk denotes the experimental unit 
error. The analysis of variance table for model (4.8) is given in Table 4.7. The error for the 
Latin square design structure consists of contrasts that measure the paint by house by 
direction interaction. 


TABLE 4.6 


Assignment of a Set of Treatments from a One-Way 
Treatment Structure to a Latin Square Design Structure 


House 


Directions 


North 

South 

East 

West 

1 

A 

B 

C 

D 

2 

D 

A 

B 

C 

3 

C 

D 

A 

B 

4 

B 

C 

D 

A 


TABLE 4.7 


Analysis of Variance Table for a One-Way Treatment 
Structure in a Latin Square Treatment Structure 


Source of Variation 

4/ 

House 

3 

Direction 

3 

Paint 

3 

Error 

6 
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TABLE 4.8 


Two-Way Treatment Structure for House Paint Example 




Additive II 

Additive I 

None 

Some 

None 

Base 

Base + II 

Some 

Base +1 

Base +1 + II 


Next, suppose the paints have a structure as given by 1) base paint, 2) base paint plus 
additive I, 3) base paint plus additive II, and 4) base paint plus both additive I and addi¬ 
tive II. This structure of the paints provides a two-way treatment structure where one 
factor is additive I at two levels (zero and some) and the second factor is additive II at 
two levels (zero and some). The resulting four treatment combinations are shown in 
Table 4.8. 

One model for a two-way treatment structure in a Latin square design structure is 

y.jkm = H+Yi + Pj + + K + d m + e ijkm i = 1,2, j = 1,2, k= 1,2,3,4, m = 1,2,3,4 (4.9) 

where y denotes the effect of additive I, /3, denotes the effect of additive II, and (y/3), denotes 
the interaction between the two additives. The analysis of variance table for model (4.9) is 
given in Table 4.9. The only difference between analyzing models (4.8) and (4.9) is that in 
model (4.9) the paints have a structure that is used to partition the paint effect into effects 
due to additive I, additive II, and the interaction of additives I and II. The part of the analy¬ 
sis corresponding to the design structure remains unaffected even though the analysis of 
the treatment structure has changed. 

Finally, suppose eight houses were available so that the experiment could be conducted 
by using two different Latin square design structures. Table 4.10 shows one possible 
assignment of paints to the house-direction combinations. If the paints have the two-way 
treatment structure in Table 4.8, then a model is given by 

]jijkmn = i U + Y + Pj + ( YP)ij + S k+ K(k) + d m + £ ijkmn * = 1/2, 7 = 1,2, (4.10) 

fc=l,2, m = 1,2,3,4, m = 1,2,3,4 


TABLE 4.9 

Analysis of Variance Table for a Two-Way Treatment 
Structure in a Latin Square Treatment Structure 


Source of Variation 

df 

House 

3 

Direction 

3 

Paint 

3 

Additive I 

1 

Additive II 

1 

Additive I x Additive II 

1 

Error 

6 
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TABLE 4.10 


Arrangement Showing a One-Way Treatment Structure in a Replicated 
Latin Square Design Structure 


Direction 




House 





Square 1 



Square 2 


1 

2 

3 

4 

5 

6 

7 

8 

N 

C 

A 

B 

D 

D 

C 

A 

B 

S 

D 

B 

C 

A 

C 

B 

D 

A 

E 

A 

C 

D 

B 

A 

D 

B 

C 

W 

B 

D 

A 

C 

B 

A 

C 

D 


TABLE 4.11 


Analysis of Variance Table for a Two-Way Treatment 
Structure in a Repeated Latin Square Treatment Structure 


Source of Variation 

df 

Houses 

7 

Squares 

1 

Houses (square) 

6 

Direction 

3 

Paint 

3 

Additive I 

1 

Additive II 

1 

Additive I x additive II 

1 

Error 

18 


where s k denotes the effect of square k and h n(kj denotes the effect of house n in square k. The 
analysis of variance table for model (4.10) is given in Table 4.11. 


4.3.3 Example 4.3: Steel Plates 

A Latin square design structure is very useful when there is a need to block in two 
directions, but every Latin square arrangement used by experimenters is not a Latin square 
design structure. This example is used to demonstrate the consequences of using a Latin 
square arrangement treatment structure. Two types of paint additives are to be combined 
and steel plates are to be painted. The objective of the experiment is to study the ability of 
the paint combinations to protect steel from heat. There are five levels of each paint 
additive and five temperatures at which to check the protecting ability This experiment is 
suited for a Latin square array where the levels of additive I are assigned to the rows, the 
levels of additive II are assigned to the columns, and the levels of temperature are assigned 
to the cells within the square. This arrangement generates 25 treatment combinations. The 
experimental units are 25 sheets of steel 0.2 cm thick and 1 m 2 in area. The randomization 
process is to randomly assign one of the 25 treatment combinations to each of the 25 sheets 
of steel. In this case, the treatment structure is a fraction of a 5 3 factorial arrangement 
(as it consists of 25 of the 125 possible treatment combinations of additive I x additive 
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II x temperature), called a Latin square arrangement or Latin square treatment structure. 
The design structure is a completely randomized design, as the treatment combinations 
are assigned completely at random to the sheets of steel. Since this is a fractional factorial, 
each main effect is partially aliased (see Cochran and Cox, 1957, p. 245) with the two-factor 
interaction of the other two factors and the three-factor interaction. In order to properly 
analyze this experimental design, some assumptions must be made about the parameters 
in the model. The usual assumptions are that there are no two-way interactions and no 
three-way interaction. However, one should be very careful not to make such assumptions 
without having some prior information (which can come from other experiments, existing 
literature, etc.) showing that the interactions are, in fact negligible. One such Latin square 
arrangement is given in Table 4.12, and model for a Latin square treatment structure in a 
completely randomized design structure is 

Vijk = H + AI, + Ally +T k + e ijk , (z, j, k) e Index (4.11) 

where AI, denotes the effect of the zth level of additive I, All, denotes the effect of the /1h 
level of additive II, T k denotes the effect of the kth level of temperature, and Index denotes 
an index set consisting of the 25 treatment combinations observed in the experiment. 

If one ignores the application of the levels of temperature, the resulting data table is that 
of a two-way treatment structure in a completely randomized design structure, as shown 
in Table 4.13. The analysis of variance table for this two-way treatment structure is in 
Table 4.14. The design consists of one observation per treatment combination, so there are 


TABLE 4.12 

Latin Square Arrangement Treatment Structure, Where T, Denotes the zth 
Level of Temperature 




Level of Additive II 



Level of Additive I 

1 

2 

3 

4 

5 

1 

T 

t 2 

t 3 

h 

T 5 

2 

T 5 


t 2 

t 3 

T t 

3 

h 


T 1 

t 2 

t 3 

4 

T, 

t 4 

T 5 

T 

t 2 

5 

t 2 

t 3 

t 4 

t 5 

T 

TABLE 4.13 






The Two-Way Treatment Structure for the Levels of Additive I by the Levels of 
Additive II, Ignoring the Levels of Temperature in the Latin Square Arrangement 



Level of Additive II 



Level of Additive I 

1 

2 

3 

4 

5 

1 

(L 1) 

(1,2) 

(1,3) 

(1,4) 

(1,5) 

2 

(2,1) 

(2,2) 

(2,3) 

(2,4) 

(2,5) 

3 

(3,1) 

(3,2) 

(3,3) 

(3,4) 

(3,5) 

4 

(4,1) 

(4,2) 

(4,3) 

(4,4) 

(4,5) 

5 

(5,1) 

(5,2) 

(5,3) 

(5,4) 

(5,5) 
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TABLE 4.14 

Analysis of Variance Table for Two-Way Treatment 
Structure Part of the Latin Square Arrangement 


Source 

df 

Additive I 

4 

Additive II 

4 

Additive I x additive II 

16 

Error 

0 


no degrees of freedom available for estimating the error as there no sheets of steel treated 
alike. The interaction between the levels of AI and the levels of All is associated with 
16 degrees of freedom. Now, when temperature is included in the structure, the sum of 
squares due to testing the equality of temperature means is part of the AI by All interaction 
sum of squares, as denoted in Table 4.15. This means that part of the AI by All interaction 
is identical to the temperature effect, or four degrees of freedom associated with tempera¬ 
ture are aliased with four degrees of freedom associated with the AI by All interaction. 
Likewise, the four degrees of freedom associated with the AI effect are aliased with four 
degrees of freedom of the All by temperature interaction and the four degrees of freedom 
associated with the All effect are aliased with four degrees of freedom of the AI by tem¬ 
perature interaction. The analysis of variance table for the Latin square treatment structure 
using model (4.11) is given in Table 4.16. The term "residual" is used rather than "error" 
since the corresponding sum of squares involves error plus any interaction effects that may 
not be zero. If the assumption of zero interactions is not correct, then the residual mean 
square will be too large and the resulting F-tests will be too small. Consequently, if there is 
interaction in the experiment, it cannot be discovered and any other detectable treatment 
effects may be masked. 

4.3.4 Example 4.4: Levels of N and K 

A model and the resulting analysis consist of three basic components, i) the treatment 
structure, ii) the design structure, and iii) the error structure(s). This example demonstrates 
how the three basic components can be used to construct the model. A plant breeder wants 


TABLE 4.15 


Analysis of Variance Table for Two-Way Treatment Structure Part of 
the Latin Square Arrangement with the Variation for Temperatures 
Partitioned from the Additive I by Additive II Interaction 


Source 

df 

Additive I 

4 

Additive II 

4 

Additive I x additive II 

16 

Temperature 

4 

Residual 

12 

Error 

0 
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TABLE 4.16 

Analysis of Variance Table for the Latin 
Square Arrangement Treatment Structure 


Source 

df 

Additive I 

4 

Additive II 

4 

Temperature 

4 

Residual 

12 


to study the effect of combining three levels of nitrogen ( N ) and four levels of potassium 
( K ) on the yield of his new variety of corn. His treatment structure is a two-way factorial 
arrangement with 12 (3 levels of N x 4 levels of K) treatment combinations. He has three 
parcels of land on which to carry out the experiment and he uses these parcels of land as 
blocks. Each of the blocks is partitioned into 12 parts called plots. Each treatment combina¬ 
tion is randomly assigned to one plot within each block. Thus, the design structure is a 
randomized complete block design since each treatment combination occurs once in each 
block. The total design of the experiment is called a two-way treatment structure in a ran¬ 
domized complete block design structure. (Blocks in a randomized complete block design 
are called replications by some authors; however, we prefer to call them blocks instead of 
replications in order to distinguish them from replications in the completely randomized 
design structure. The discussion in Example 4.5 describes the important differences 
between blocks and replications.) The model for this example is 

Vtjk = Hij + K + £ ,jk i = X 2 ,3, j = 1,2,3,4, k = 1,2,3 (4.12) 

where p,j is the mean of the zth level of N with the j th level of K, b k is the effect of the kth 
block, and e i;Vt denotes the random error associated with the plots within each block. In 
general, the general model is constructed by summing the models for each of the three 
structures as 


Y = treatment structure + design structure + error structure(s) (4-13) 

Likewise, the corresponding analysis of variance table has three parts. The general 
analysis of variance table for model (4.13) is given in Table 4.17. The analysis of variance 
table for model (4.12) is given in Table 4.18. In general, allowance must be made for the 
possibility of more than one error term. For example, split-plot and repeated measures 
models have more than one error term (see Chapter 5). 


TABLE 4.17 


Analysis of Variance Table for the General Model 


Source of Variation 

df 

Design structure 

dfvs 

Treatment structure 

^/ts 

Error structure(s) 

df ERROR(S) 
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TABLE 4.18 


Analysis of Variance Table for a Two-Way Treatment Structure 
in a Randomized Complete Block Design Structure 


Source of Variation 

df 

Design structure 


Blocks 

2 

Treatment structure 

11 

N 

2 

K 

3 

NxK 

6 

Error structure 


Block x treatment 

22 


4.3.5 Example 4.5: Blocks and Replications 

In many of the textbooks on the design of experiments, there is either no distinction made 
between blocks and replications or, at the very least, there is confusion about the distinction. 
This set of examples is included to demonstrate the difference between the two concepts as 
well as to illustrate that the combination of a treatment structure with a design structure can 
result in more than one total design of the experiment. Suppose the researcher wants to study 
the effect of four treatments in a one-way treatment structure using a design structure with 
six blocks of size two (she has only two homogeneous experimental units per block). In this 
case, the required design structure is an incomplete block design. If there are enough blocks 
so that every pair of treatments can occur together in a block the same number of times, then 
it is possible to use a balanced incomplete block design structure (Cochran and Cox, 1957). 
For example, the four treatments could be assigned to blocks, as shown in Table 4.19. In this 
case, there are six blocks in the design structure, and each treatment is replicated three times. 
This example is used to point out that the concepts of blocks and replications are different 
and to emphasize that blocks and replications should always be kept separate. Blocks and 
replications are equivalent only for the case of a randomized complete block design structure 
where each treatment is observed once and only once within each block. The randomization 
process for the incomplete block design structure consists of assigning blocks to block num¬ 
bers and then randomly assign the two treatments assigned to that block to the two experi¬ 
mental units within the block. For the example in Table 4.19, the design structure is associated 
with the six blocks (see Table 4.19), not the three replications that just happen to occur 
because of the assignment process. The model for the arrangement in Table 4.19 is 

yij = Pi + bj + £jj, for (i, j) e Index (4.14) 


TABLE 4.19 

First Assignment of Four Treatments to Six Blocks of Two Experimental 
Units, Providing a Balanced Incomplete Block Design Structure 


Block 1 

Block 2 

Block 3 

Block 4 

Block 5 

Block 6 

A 

A 

A 

B 

B 

C 

B 

C 

D 

C 

D 

D 



Basics for Designing Experiments 


95 


where Index = {(A, 1), (B, 1), (A, 2), (C, 2), (A, 3), (D, 3), (B, 4), (C, 4), (B, 5), (D, 5), (C, 6), (D, 6)} 
and the pair (i, j) can take on only those values of treatment x block combinations that are 
observed as indicated by the Index set. The analysis of variance table for model (4.14) is given 
in Table 4.20. The degrees of freedom associated with this connected block-treatment arrange¬ 
ment are computed from the degrees of freedom associated with the block by treatment 
interaction as if all combinations were observed minus the number of empty cells. Table 4.21 
is a display of the observed block-treatment combinations (denoted by the "X"). There are 
six blocks and four treatments, so if all combinations were present, the block by treatment 
interaction would be based on (6 - 1)(4 - 1) = 15 degrees of freedom. There are 12 missing 
cells, so the number of degrees of freedom associated with the error term is 15 - 12 = 3. 

Table 4.22 contains a second assignment pattern for assigning the four treatments to the 
six blocks of size two. Treatment A occurs in all blocks and is replicated six times. Treatments 
B, C, and D occur in two blocks providing two replications of each. Model (4.14) can be 
used to describe the data where the index set is 


Index = {(A, 1), (B, 1), (A,2), (C,2), (A,3), (D,3), (A,4), (B,4), (A,5), (C,5), (A,6), (D,6)} 


TABLE 4.20 


Analysis of Variance Table for the Balanced 
Incomplete Block Design Structure in Table 4.19 


Source 

df 

Block 

5 

Treatments 

3 

Error 

3 


TABLE 4.21 

Two-Way Structure of Blocks by Treatments Where X Denotes Observed 
Combination Showing That 12 Cells Are Filled and 12 Cells Empty 


Blocks 

A 

B 

c 

D 

1 

X 

X 



2 

X 


X 


3 

X 



X 

4 


X 

X 


5 


X 


X 

6 



X 

X 


TABLE 4.22 


Second Assignment of Four Treatments to Six Blocks of Two Experimental 
Units, Providing an Unbalanced Incomplete Block Design Structure 


Block 1 

Block 2 

Block 3 

Block 4 

Block 5 

Block 6 

A 

A 

A 

A 

A 

A 

B 

C 

D 

B 

C 

D 
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The analysis of variance table in Table 4.20 is also appropriate for the arrangement in 
Table 4.22. The arrangement in Table 4.19 is optimal (using D-optimality criteria; St. John 
and Draper, 1975) if one wishes to compare all four treatments to one another, that is, test 
p A = Mb = A*c = M d- The arrangement in Table 4.22 is optimal if one wishes to compare treat¬ 
ment A to each of the other three treatments; that is, test p A = p B , p A = p c , p A = p D . 

A third way to assign the four treatments to six blocks is presented in Table 4.23. There 
are two groups of treatments where A and B occur together in three blocks and treatments 
C and D occur together in three blocks. Treatment A does not occur in a block with either 
treatment C or D. The structure between the treatments and blocks is not connected because 
of the separation of the treatments into these two groups. Define a new variable, "group," to 
indicate the two groups of treatments. The comparison between the two groups is a compari¬ 
son between the mean of blocks 1, 3, and 5 and the mean blocks 2,4, and 6, or is a between- 
block comparison. The comparisons of treatments A with B and of treatments C with D are 
within-block comparisons. The number of degrees of freedom associated with the block by 
treatment interaction for treatments A and B is two. Likewise, the number of degrees of 
freedom associated with the block by treatment interaction for treatments C and D is also 
two. The error sum of squares is obtained by pooling these two block by treatment interac¬ 
tion sums of squares (as well as their degrees of freedom). The analysis of variance table for 
the arrangement in Table 4.23 is displayed in Table 4.24. The sum of squares due to blocks 
is partitioned into the sum of squares for groups and the sum of squares for blocks 
nested within groups. The sum of squares due to groups is used as the error to test the 
hypothesis (p A + p B = p c + p D ), which has one of the degrees of freedom due to treatments. 
The two groups of treatments are confounded with the blocks; that is, if there is a differ¬ 
ence between the two groups' means, you do not know if it is due to the difference between 
the sets of treatments or to the differences among the two groups of blocks. The concept of 
confounding is similar to the concept of aliasing, but aliasing involves two (or more) terms 
being indistinguishable where both terms are from the treatment structure and confound¬ 
ing involves two terms being indistinguishable where one term is from the treatment struc¬ 
ture and one term is from the design structure. 

TABLE 4.23 


Third Assignment of Four Treatments to Six Blocks of 
Two Experimental Units, Providing an Unconnected 
Incomplete Block Design Structure 


Block 1 

Block 2 

Block 3 

Block 4 

Block 5 

Block 6 

A 

C 

A 

C 

A 

C 

B 

D 

B 

D 

B 

D 


TABLE 4.24 


Analysis of Variance Table for Incomplete Block 
Design in Table 4.25 


Source 

df 

Groups (ji A + fi B = He + Hd) 

1 

Blocks (groups) 

4 

? 

II 

2= 

1 

Be = Ud 

1 

Error 

4 
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The differences between the three designs in Tables 4.19,4.22, and 4.23 are in the assign¬ 
ment of the treatments to the blocks. Remember the design structures and the treatment 
structures are identical for all three designs, so the design and treatment structures do not 
describe the total design of the experiment. One must also specify the method of randomly 
assigning the treatments from the treatment structure to the experimental units in the 
design structure. 


4.3.6 Example 4.6: Row and Column Blocks 

Blocking can occur in many ways and the Latin square design structure is one design 
where there are both row blocks and column blocks. There are various alterations of Latin 
square design structures where there are fewer rows (or columns) or more rows (or 
columns) than there are treatments (Cochran and Cox, 1957). This example consists of the 
experimental units being blocked by rows and columns where the intersection of each 
row and column contains several experimental units. The treatment structure is a two- 
way with factor A having two levels and factor B having two levels, thus generating four 
treatment combinations. There are 24 experimental units arranged in three columns and 
two rows where each row-column combination contains four experimental units. 
Randomly assign the four treatment combinations to the four experimental units within 
each of the row-column groups. Table 4.25 is a display of the assignment of treatment 
combinations (not randomized) to the experimental units. This design structure essen¬ 
tially consists of six blocks of size four, but it may be of interest to evaluate the variability 
between the row blocks and among column blocks, so a model can be constructed to 
include those factors as: 

Vijbn = b + «, + A + («/?)*; + h + c m + (rc) km + e tjkm for i = 1,2, (4.15) 

j = 1,2, A: = 1,2, m — 1,2,3 

where p + a, + A + (a/I),, denotes the main effects and interaction of factors A and B, r k 
denotes the row block effect, c,„ denotes the column block effect, ( rc) km denotes the interaction 
among the row and column blocks and £ ljkm denotes the experimental unit error. The 
analysis of variance table corresponding to model (4.15) is in Table 4.26. The analysis could 
just involve six blocks and the sums of squares due to row blocks, column blocks and their 
interaction could be pooled to provide five degrees of freedom for the design structure. 
The error sum of squares is obtained from the treatment structure by design structure 
interaction; that is, by pooling A x row block, A x column block, A x row block x column 
block, B x row block, B x column block, B x row block x column block, AxBx row block, 
AxBx column block, and AxBx row block x column block sums of squares. 


TABLE 4.25 

Design with Row and Column Blocks in the Design Structure and a Two-Way 
Arrangement in the Treatment Structure (Nonrandomized Form) 

Column Block 1 Column Block 2 Column Block 3 

Row block 1 AjBj AjB 2 AjBj AjB 2 AjBi AjB 2 

/\ 2 B, A 2 B, /t -,B , AJ} 2 A 2 ft | A,B 2 

A ^ AjB 2 AjBj AjB 2 AjBj AjB 2 


Row block 2 
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TABLE 4.26 

Analysis of Variance Table for Row-Column Design Structure 


with Two-Way Treatment Structure 

Source df 

Design structure 5 

Row blocks 1 

Column blocks 2 

Row blocks x column blocks 2 

Treatment structure 3 

A 1 

B 1 

AxB 1 

Error = design structure x treatment structure 15 


There are many other ways to construct designed experiments by combining various 
design structures and treatment structures. Hopefully, the above examples will enable the 
experimenter to construct the desired designed experiment, construct an appropriate 
model, and develop the corresponding analysis. 


4.4 Concluding Remarks 

This chapter presented concepts and methods experimenters can use in designing and 
analyzing experiments. The basic concepts for good designed experiments were also intro¬ 
duced. All designed experiments consist of two basic features: the treatment structure and 
the design structure. These concepts are generally not used in many other statistical analy¬ 
sis books. Understanding the difference between these two features of a designed experi¬ 
ment will help data analysts select appropriate analyses for their experiments. The choice 
of blocking factors is discussed where it is imperative that they do not interact with the 
factors in the treatment structure. Finally, it is mandatory that one is able to identify when 
one has true replications and when one merely has subsamples. 


4.5 Exercises 

4.1 Find two published research papers in the literature which use a two-way or 
higher order treatment structure within a designed experiment which has only 
one size of experimental unit. For each paper, describe in detail the treatment 
structure, the design structure and the experimental unit used in the experiment. 
Comment as to the appropriateness of the design and its analysis [at least as far 
as the information provided by the author(s) is concerned]. 

4.2 The Tire Co. researcher wanted to determine if her new design of tire wears 
better than the existing designs. She selected three existing designs to be 
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evaluated with her new design. She had four tires from each design, all of the 
same size. She had available four cars and four drivers that could be used during 
the driving test. The test was to measure the tread wear during the 25,000 mile 
driving test. Describe how you would design an appropriate experiment for her. 
Describe in detail the treatment structure and the design structure. Write down 
the appropriate model and key out the corresponding analysis of variance table; 
include sources and degrees of freedom. 

4.3 A food scientist wants to develop a healthy yet enjoyable muffin by changing 
some of the ingredients. He has three main factors to vary, oil at three levels, 
sugar at two levels, and egg white powder at four levels. In order to have a refer¬ 
ence point, he compared the experimental muffins with the standard muffin rec¬ 
ipe. He has a pan in which he can bake 25 muffins at a time. Design an experiment 
for him, describe the design and treatment structures, write down an appropri¬ 
ate model, and key out the corresponding analysis of variance table. 

4.4 A plant breeder wants to evaluate how well com plants of selected varieties 
grow in a high temperature-low humidity environment. A growth chamber is 
available for the study that can be used to control the temperature and humidity. 
She has four cultivars (or treatments) that should be tolerant to the hot-dry envi¬ 
ronment. The growth chamber can hold up to seven pots, each pot consisting of 
plants from a single cultivar. The growth chamber can be used up to four times. 
The growth of the plants (increase in dry matter per plant) is the measurement 
of interest. Design an experiment for her, describe the design and treatment 
structures, write down an appropriate model, and key out the corresponding 
analysis of variance table. 

4.5 Discuss the changes in the designed experiment that must take place for the 
house paint example if the directions cause differences in the relationships of the 
paint means. 

4.6 A researcher wants to set up a study to evaluate four methods of teaching statis¬ 
tics. One possible design would be to teach one class of students with each of the 
teaching methods and use the students in the class as replications of the teaching 
method. Discuss the implications of using this design. 
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Consulting statisticians do not always get the chance to design the experiments for which 
they must help construct appropriate analyses. Instead, the statistician must first identify 
the type of designed experiment the researcher has employed. The first and most impor¬ 
tant step in the identification process is to determine if more than one size of experimental 
unit has been used, and if so, to identify each size of experimental unit. As will become 
evident in this section, each size of experimental unit will have an associated design 
structure and treatment structure. After the different sizes of the experimental units have 
been identified, the model for carrying out an appropriate analysis can be constructed by 
combining the models used to describe the design structure and treatment structure 
corresponding to each size of experimental unit. 


5.1 Identifying Sizes of Experimental Units—Four Basic 
Design Structures 

The design structures that involve more than one size of experimental unit are called 
multilevel designs structures and include split-plot type design structures, strip-plot 
design structures, repeated measures design structures, hierarchical or nested types of design 
structures and design structures involving various combinations of the above. Repeated 
measures and split-plot type design structures are similar, although the assumptions used 
to develop the analyses can be different. Split-plot and strip-plot design structures evolved 
from the agricultural sciences, but are used in many other disciplines including engineer¬ 
ing and manufacturing, and their analyses are discussed in Chapters 24 and 25. Repeated 
measures designs are used extensively in the social and biological sciences, but are appli¬ 
cable in most areas when it is of interest to evaluate treatment effects over time, and their 
analyses are presented in Chapters 26-28. Nested treatment structures are different from 
the treatment structures used with repeated measures and split-plot designs and the 
nested treatment structures are described in Chapter 30. 
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There are four basic design structures and most complex design structures are combina¬ 
tions of these. The four basic design structures are the completely randomized design 
structure, the randomized complete block design structure, the split-plot design structure 
and the strip-plot design structure. Each of these basic design structures has its own 
analysis with a unique process for computing the required error sums of squares. 
The basic design structures are described in this section along with the process needed to 
compute the error sum of squares. An example is used in the discussion where the 
treatment structure is a two-way with one factor at two levels and the other factor is at 
three levels, and the design structure involves 18 experimental units. 

The experiment consists of evaluating the volume of cupcakes after baking where there 
are three recipes and two cooking temperatures; thus the treatment structure consists of the 
six combinations of three recipes with the two cooking temperatures. It is desired to have 
three replications of each of the treatment combinations; thus 18 cupcakes need to be baked. 
The diagram in Figure 5.1 is used to demonstrate the process of using the completely ran¬ 
domized design structure. The process is to completely at random assign the six treatment 
combinations in the treatment structure to the 18 experimental units (cupcakes) in the 
design structure. The arrows indicate that each treatment combination is assigned to three 
cupcakes. Often the process is to make one cupcake at a time, thus the order in which 
the cupcakes are mixed and baked corresponds the experimental units. The completely 
randomized design structure is completed by mixing a batch of a given recipe, filling a 
cupcake form, and then baking that cupcake in an oven set to the specific temperature. Each 
cupcake must be made from its own batch, that is, only one cupcake per batch of cake rec¬ 
ipe, and each cupcake must be baked by itself in the oven set to the specified temperature. 
This process requires the use of 18 batches of cake mix and 18 uses of one or more ovens to 
bake the cupcakes. The error associated with the completely randomized design structure 
is computed from the variability among cupcakes treated alike. There are three cupcakes 
from each recipe by temperature combination, thus the variability in the volumes of the 
three cupcakes provides two degrees of freedom to measure cupcake error. One should test 
the equality of the six treatment combination variances (see Chapter 2) and, if possible, the 
estimates of error from the six treatment combinations are pooled together to provide 12 
degrees of freedom for error that measures the variability among cupcakes treated alike. 
A model that can be used to describe the volumes of the cupcakes is 

Vijk = M + T + Pi + (t P)ij + £ ,jk> i = 1, 2, j = 1, 2,3, and A: = 1,2,3 


Treatment structure 

Combinations of temperature (T,-) and recipe (Rj) 



TiR2 

*1*3 

i 





DOMIZt 


Design structure—18 cupcakes 

FIGURE 5.1 Randomization for a two-way treatment structure in a completely randomized design structure. 
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TABLE 5.1 


Analysis of Variance Table for the Two-Way Treatment 
Structure in a Completely Randomized Design Structure 


Source 

df 

EMS 

Temperature 

1 

a\ + f(t) 

Recipe 

2 

ol + <p(P) 

Temperature x recipe 

2 

ol + <p(rP) 

Error 

12 

cl 


where y ijk denotes the volume of the /cth cupcake made with the /1h recipe and baked at 
the 7th temperature, p denotes the overall mean, t, denotes the effect of the zth tempera¬ 
ture, /j, denotes the effect of the /1h recipe, fr/d),, is the temperature by recipe interaction, 
and £ i;l denotes the variability associated with the batches, the cupcakes within a batch, 
and the variability from oven bake to oven bake. Table 5.1 contains the analysis of variance 
table for the model for the two-way treatment structure in a completely randomized design 
structure where there are 12 degrees of freedom computed from the variability of experi¬ 
mental units or cupcakes treated alike. This analysis has five degrees of freedom associ¬ 
ated with the treatment structure and there are 12 degrees of freedom associated with the 
design structure, all of which are assigned to the error term. The column of Table 5.1 labeled 
EMS gives the forms of the expected mean squares for the respective rows of the ANOVA 
table. The functions <j) 2 ( t), 0 2 (/l), and 0 2 (t/ 3) represent quadratic functions in the tempera¬ 
ture main effect means, the recipe main effect means, and the interaction effects, respec¬ 
tively. These functions are non-negative and equal to zero when the corresponding effects 
do not exist. Similar interpretations can be used throughout the remainder of this chapter. 

The second basic design structure is the randomized complete block design and the 
diagram in Figure 5.2 is a display of the process of assigning treatments from the treat¬ 
ment structure to the experimental units in the design structure. Suppose that the 
researcher can make and bake six cupcakes per day, so the experiment must be spread 


Treatment structure 

Combinations of temperature (T t ) and recipe (Rj) 



FIGURE 5.2 Randomization scheme for two-way treatment structure in randomized complete block design 
structure. 
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over three days in order to achieve three replications of each of the treatment combina¬ 
tions. The experimental units of the design structure are divided into three groups or 
blocks of size six. Next, the six treatments from the treatment structure are randomly 
assigned to the six experimental units within each of the blocks, as indicated by the arrows 
in Figure 5.2. As discussed in Chapter 4, the error sum of squares for the randomized 
complete block design structure is obtained by computing the treatment structure by 
design structure or treatment by block interaction. This design involves six treatments and 
three blocks, so the treatment by block interaction provides (3 - 1)(6 - 1) = 10 degrees of 
freedom associated with experimental error. The 10 degrees of freedom consist of pooling 
the degrees of freedom associated with the block by temperature, the block by recipe, and 
the block by temperature by recipe interactions. A model that can be used to describe the 
volumes of cupcakes for this two-way treatment structure in a randomized complete block 
design structure is 

y ijk = M + T + iQj + {Tp) tj + d k + e ijk , i = 1,2, j = 1,2,3, and k = 1,2,3 

where d k denotes the effect of the kth day, the blocking factor, and e ijk denotes the variabil¬ 
ity associated with the batches, the cupcakes within a batch, and the variability from oven 
bake to oven bake within a day. Table 5.2 contains the analysis of variance table for the 
model for the two-way treatment structure in a randomized complete block design struc¬ 
ture where there are 10 degrees of freedom computed from the variability of experimental 
units treated alike as measured by the block by treatment combination interaction. This 
analysis has the same five degrees of freedom associated with the treatment structure 
as the completely randomized design structure, but now, the 12 degrees of freedom for 
the design structure are distributed between days (blocks) and the error. The term a 2 day in 
Table 5.2 represents the variance of d k , k = 1, 2, 3. Similar interpretations can be made 
throughout the remainder of this chapter. 

The third basic design structure is the split-plot design structure. Here the 18 cupcakes 
are divided into six blocks of size three, as shown in Figure 5.3. Since there are six treat¬ 
ment combinations in the treatment structure, all six treatments cannot occur within a 
block, thus this is an incomplete block design structure. Three cupcakes, one from each 
of the recipes, will be baked at a specified temperature within the same oven (only one 
cupcake was baked in an oven for the first two basic design structures). The diagram in 
Figure 5.3 shows that the treatment structure has been separated into two parts, one desig¬ 
nated as the cupcake part and one for the oven part. The blocks of size three or the three 
cupcakes assigned to each oven form the experimental units for the levels of temperature. 
Thus the oven is the experimental unit for temperature. The first part of the randomization 


TABLE 5.2 


Analysis of Variance Table for the Two-Way Treatment Structure 
in a Randomized Complete Block Design Structure 


Source 

df 

EMS 

Day 

2 

ff t + 6c7 day 

Temperature 

1 

o\ + f (T) 

Recipe 

2 

ol+f(l}) 

Temperature x recipe 

2 

<n + 

Error 

10 
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Cupcake treatment structure 


Cupcake or 
subplot or 
small size of 
experimental 
unit 



Oven or whole-plot 
or large size of 
experimental unit 


RANDOMIZE 


Temperature 1 


Temperature 2 


Oven treatment structure 

FIGURE 5.3 Randomization scheme for the split-plot with completely randomized whole-plot design structure. 


procedure is to randomly assign each temperature to three of the ovens or blocks, as 
demonstrated by the arrows going from the levels of temperature to the ovens in Figure 5.3. 
The cupcakes within the ovens are the experimental units for the levels of recipe and the 
randomization procedure is to randomly assign the levels of recipe to one cupcake within 
each of the ovens. There are two sizes of experimental units and there are two design 
and treatment structures. The treatment and design structures for the oven experimental 
units consists a one-way treatment structure (two levels of temperature) in a completely 
randomized design structure with six ovens. For the individual cupcakes, the treatment 
and design structures consist of a one-way treatment structure (three levels of recipe) in a 
randomized complete block design structure where the ovens represent blocks of similar 
experimental units. 

This design structure is a nested or hierarchical structure as the cupcakes are nested 
within the ovens. Thus, the split-plot design is also a hierarchical design structure. Since 
there are two sizes of experimental units, this is called a multilevel design. The oven is the 
larger size of experimental unit and is often called the whole-plot. The cupcake is the 
smaller size of experimental unit and is often called the subplot or split-plot. 

The first step in the analysis of this split-plot design is to ignore the individual cupcakes or 
recipes and just consider the two temperatures and the six ovens. The display in Figure 5.4 
indicates that the design corresponding to the oven size of experimental unit is a one-way 
treatment structure in a completely randomized design structure. A model to describe a 
response measured on each of these ovens is 

y* k =p + T/ + e ik , i = 1,2, and k = 1,2,3 

where y* k denotes the response measured on the kth oven assigned to the zth temperature 
and e ik represents the error term associated with the ovens. Table 5.3 contains the analysis 
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Recipe 1 ! Recipe 2 Recipe 3 


: ■> \ .. ■ • • -v. 



Temperature 1 Temperature 2 


Oven treatment structure 

FIGURE 5.4 Oven design and treatment structures for split-plot design. 

of variance table for the oven model. For each level of temperature there are three ovens 
that are treated alike, thus there are two degrees of freedom available from each temperature 
measuring how ovens vary when treated alike. If the variances of the ovens between the 
two temperatures are equal, then the two variances can be pooled to provide the error 
term for ovens with four degrees of freedom. 

The next step in the analysis is to ignore the levels of temperature providing a design 
that is a one-way treatment structure (three recipes) in a randomized complete block design 
structure (six blocks). The randomization process is displayed in Figure 5.5. A model that 
can be used to describe the volume of a cupcake is 

y ljk = p + Pj + o ik + e* k , i = 1,2, j = 1,2,3, and k = 1,2,3 

where o ik denotes the effect of the block or oven effect and e* k denotes the error associated 
with a cupcake. The analysis of variance table is in Table 5.4 where the residual sum of 
squares consists of the recipe by oven interaction. If all ovens were treated alike, the 
residual sum of squares would provide an estimate of the cupcake to cupcake variability. 
However, some ovens are subjected to one temperature and the other ovens are subjected 
to a second temperature; hence, the recipe by oven interaction includes the temperature by 

TABLE 5.3 

Analysis of Variance Table for the One-Way Treatment Structure 
in a Completely Randomized Design Structure for the Ovens as 
Experimental Units Where the Levels of Recipe Are Ignored 


Source 

df 

EMS 

Temperature 

1 

a\ + (p(f) 

Error (ovens) 

4 

a 2 e 
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Cupcake treatment structure 


Recipe 1 
-■O ' 1 

■■ "" 


Recipe 2 






Recipe 3 


A 




Cupcake or 
subplot or 
small size of 
experimental 
unit 


A 


' •' x .v * -X I r 

* ^ Aff * i r ** 






Temperature 1 : Temperature 2 ; 

Oven treatment structure 

FIGURE 5.5 Randomized complete block design structure for the cupcake experimental unit. 


recipe interaction. The expected mean squares in Table 5.4 are not fully determined by this 
model, and that is denoted by the in the table. One additional reduction in the design 
is required in order to obtain the cupcake to cupcake variability. Consider only those ovens 
subjected to temperature 1, as displayed by the dark lines in Figure 5.6. The reduced design 
is that of a one-way treatment structure in a randomized complete block design structure 
with three blocks all treated alike (temperature 1), and the analysis of variance table is 
given in Table 5.5. The recipe by oven interaction provides the measure of how cupcakes 
within an oven treated alike will vary. The process continues by considering the other 
three ovens at the second temperature where the recipe by oven interaction provides an 
additional four degrees of freedom for measuring how cupcakes vary when treated alike 
within an oven. When the variances of the cupcakes from the two temperatures are equal 
(see Chapter 2 for tests), one can pool the two sources together to provide the error sum of 
squares for measuring how cupcakes vary when treated alike with eight degrees of free¬ 
dom. The cupcake error sum of squares is computed by computing the oven by recipe 
interaction within a temperature pooled across temperatures. The final complete split-plot 


TABLE 5.4 


Analysis of Variance Table for the One-Way Treatment 
Structure in a Randomized Complete Block Design 
Structure Where the Levels of Temperature Are Ignored 


Source 

df 

EMS 

Ovens 

5 

o 2 e + 3<rLn + ? 

Recipe 

2 

<r 2 e + <t> 2 {P) + ? 

Residual 

10 

a 2 , + ? 
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Cupcake treatment structure 



TABLE 5.5 


Analysis of Variance Table for Temperature 1 Data for 
the One-Way Treatment Structure in a Randomized 
Complete Block Design Structure 


Source 

df 

EMS 

Ovens 

2 

(T e + 3cr oven 

Recipe 

2 

o\ + f(P) 

Residual 

4 



analysis of variance table with the recipe by temperature interaction separated from the 
cupcake residual is given in Table 5.6. A model to describe the data from a two-way treat¬ 
ment structure in a split-plot design structure with a completely randomized design 
whole-plot design structure is 

y ijk = p + T, , + + o ik + £ ip i = 1,2, = 1,2,3, and k = 1,2,3 

where o ik denotes oven variation within a temperature and e ijk denotes the variability of 
cupcakes within an oven. There are still five degrees of freedom associated with the treat¬ 
ment structure in Table 5.6, but the 12 degrees of freedom associated with the design struc¬ 
ture are distributed between the two error terms where there are four degrees of freedom 
associated with the oven error component and eight degrees of freedom associated with 
the cupcake error component. 

As a variation on the split-plot design structure, suppose that the researcher can only do 
two oven runs within a day; thus, the ovens are separated into three sets of two ovens 
where the study will take three days to complete. The diagram in Figure 5.7 depicts the 
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TABLE 5.6 


Analysis of Variance Table for the Two-Way Treatment 
Structure in a Split-Plot Design with a Completely 
Randomized Whole-Plot Design Structure 


Source 

df 

EMS 

Temperature 

1 

<? 2 e+ 3<7oven + fM 

Error (oven) 

4 

<7 e + 3<T oven 

Recipe 

2 

o\ + <P(P) 

Temperature x recipe 

2 

el + firP) 

Error (cupcake) 

8 



random assignment of the levels of temperature to the ovens within each day and then 
randomly assigning the levels of recipes within each of the ovens. The oven design struc¬ 
ture is a randomized complete block design, so the oven error term is computed by the day 
by temperature interaction. Imposing a blocking structure on the ovens or whole-plots 
does not change the design structure for the cupcakes or subplots. There still are six blocks 
of size three, so the cupcake part of the analysis does not change. A model to describe the 
volume of cupcakes for a two-way treatment structure in a split-plot design structure with 
a randomized complete block whole-plot design structure is 

1 hjk = P+ T + A + (TjS),y + d k + o ik + e ijk , i = 1, 2 , j = 1, 2 , 3, and k = 1,2,3 

where d k denotes the day effect, o ik denotes the oven variation within a day and e ijk denotes 
the variability of cupcakes within an oven within a day. The analysis of variance table 
corresponding to the above model is given in Table 5.7 which includes rows for days, 
error (oven) and error (cupcake). 


Cupcake treatment structure 


Recipe 1 


Recipe 2 


.A,"- T 


V'>f\ x -n 


Recipe 3 


Z E 


Cupcake or 
subplot or 
small size of 
experimental 
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Oven or whole-plot 
or large size of 
experimental unit 


R AND OMT Z E 


Temperature 1 


Temperature 2 


Oven treatment structure 


FIGURE 5.7 Diagram of split-plot design structure with a randomized complete block whole-plot (oven) 
design structure. 
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TABLE 5.7 


Analysis of Variance Table for the Two-Way Treatment 
Structure in a Split-Plot Design with a Randomized Complete 
Block Whole-Plot Design Structure 


Source 

df 

EMS 

Day 

2 

<*l + Sloven + 6ff^ ay 

Temperature 

1 

o\ + 3cr^ ven + <p(x) 

Error (oven) 

2 

O e + 3(7 oven 

Recipe 

2 

<J 2 £ +f(p) 

Temperature x recipe 

2 

a] + 

Error (cupcake) 

8 



As mentioned above, the split-plot design structures are incomplete block designs. 
Figure 5.8 contains the display of the assignment of treatment combinations to the ovens 
or blocks within each of the days. Only three treatment combinations can occur within 
each of the blocks and, in fact, within a block only those treatment combinations with the 
same level of temperature can occur. The resulting design is a partially balanced incom¬ 
plete block where some treatment combinations always occur together within a block and 
some treatment combinations never occur together within a block. 

The fourth basic design structure is a strip-plot design. The strip-plot design structure 
is constructed by first arranging the experimental units into rectangles with rows and 
columns, as displayed in Figure 5.9. In this experiment, a batch of cake dough is mixed 
using one of the recipes and two cupcakes are extracted from the batch. The batch is the 
entity being made from a recipe, so the batch is the experimental unit for the levels of 
recipe. One of the cupcakes is to be baked at temperature 1 and the other cupcake is to be 
baked at temperature 2. Flence, each oven will include three cupcakes, one from each rec¬ 
ipe. The oven is the entity to which a level of temperature is assigned and is the experimen¬ 
tal unit for the levels of temperature. 

The columns of the rectangles in Figure 5.9 correspond to the batches of cake dough and 
the three recipes are randomly assigned to the columns; that is, both cupcakes in a column 
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FIGURE 5.8 Split-plot design structure expressed as an incomplete block design structure. 
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Batch is 



are from the same batch and the batch is the experimental unit for the levels of recipe. 
The rows consist of three cupcakes and correspond to the ovens, which are the experimen¬ 
tal units for the levels of temperature. The analysis of this design can be constructed by 
considering the design and treatment structures for each size of experimental unit. First, 
ignore the recipes and only consider the rows of the rectangles, then the oven design is a 
one-way treatment structure in a randomized complete block design structure. The error 
term corresponding to ovens is computed from the rectangle (or day) by temperature inter¬ 
action, which provides two degrees of freedom for error (oven), as displayed in Table 5.8. 
Next, ignore the temperatures and only consider the columns of the rectangles, then the 
batch design is a one-way treatment structure in a randomized complete block design 
structure. The error term corresponding to the batches is computed from the rectangle 
(or day) by recipe interaction which provides four degrees of freedom for error (batch) as 
displayed in Table 5.9. Finally, the interactions between the recipes and temperatures are 
contrasts that are free of the row effects and free of the column effects, leaving the cupcake 


TABLE 5.8 


Analysis of Variance Table for the Temperature Part of the 
Treatment Structure in the Strip-Plot Design Structure 


Source 

df 

EMS 

Day 

2 

Coven + 2 <J day 

Temperature 

1 

Coven +f(r) 

Error (oven) 

2 

2* 

^oven 
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TABLE 5.9 


Analysis of Variance Table for the Recipe Part of the 
Treatment Structure in a Strip-Plot Design Structure 


Source 

df 

EMS 

Day 

2 

<tch + 3cT d* ay 

Recipe 

2 

<tch+^) 

Error (batch) 

4 

2* 

^batch 


to be the experimental unit for interaction comparisons. The interaction between recipes, 
temperatures, and rectangles provides the cupcake error term, which in this case has four 
degrees of freedom. This design involves three sizes of experimental units and a model 
that can be used to describe data from this structure is 

ijijk = p+ T ; + 1 3j+ (Tp)^ + d k + o ik + b jk + £ ijk , i = 1,2 , j = 1 , 2 , 3 , and k = 1,2,3 

where d k denotes the rectangle or day effect, o ik denotes the oven effect within a rectangle, 
bj k denotes the batch effect within a rectangle, and e ijk denotes the cupcake effect within a 
batch, oven and rectangle. The analysis of variance table corresponding to the strip-plot 
model is displayed in Table 5.10 where there are three error terms. There are still five 
degrees of freedom associated with the treatment structure, but the 12 degrees of freedom 
associates with the design structure are distributed as two for rectangles, two for error 
(oven), four for error (batch) and four for error (cupcake). The strip-plot design structure is 
a multilevel design, but it is not a hierarchical design structure since the rows are not 
nested within the columns and the columns are not nested within the rows. The rows and 
columns are nested within a rectangle, but that is where the nesting stops. 

As will be discussed in Chapter 20, the expected mean squares corresponding to an 
effect will indicate which error term is used in evaluating hypotheses associated that 
effect. Some expected mean squares involve variances with superscripts to indicate 
that there are other factors influencing the variability than are indicated by the subscript. 

The process of writing the error terms in the analysis of variance table by including the 
size of experimental unit associated with the error in parentheses is a convention used 
throughout the rest of this book whenever there is more than one size of experimental unit 


TABLE 5.10 


Analysis of Variance Table for the Two-Way Treatment 
Structure in a Strip-Plot Design Structure 


Source 

df 

EMS 

Day 

2 

cr| + 3crLn + 2< tch + 6^ ay 

Temperature 

1 

o 2 £ + 3ol ven + <p(r) 

Error (oven) 

2 

<7 e + 3<7 oven 

Recipe 

2 

ff c + 2(T ba t ch +fW) 

Error (batch) 

4 

ff ? + 2<7 batch 

Temperature x recipe 

2 

(jj+flTp) 

Error (cupcake) 

4 
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involved in a study. This convention enables researchers to easily identify the sources of 
variation in their studies. 

More complex design structures are often more conserving in terms of the material and 
time required to carry out the studies. For the cupcake baking experiment, the completely 
randomized and randomized complete block design structures require 18 batches of cupcake 
dough to be made and an oven to be used 18 times for 18 bakes. The split-plot design struc¬ 
tures involve 18 batches of cake dough (one for each cupcake), but only six baking times. 
Flence the split-plot design uses only one-third of the time for baking the cupcakes as that 
required for the completely randomized design. The strip-plot requires nine batches of cake 
dough and six baking times. The strip-plot design structure uses only one-half of the batches 
of cake dough and one-third of the time for baking. Using the more complex design struc¬ 
tures is often time- and resource-conserving as well as being a more convenient way to per¬ 
form experiments, provided one can resolve the necessary error terms as discussed above. 

The degrees of freedom associated with the error corresponding to the smallest experi¬ 
mental unit size are reduced as more structure is imposed on the design. However, in the 
previous example, the error associated with a completely randomized design includes 
variation due to ovens, batches and cupcakes within a batch. The split-plot design has two 
error terms where part of the error is designated as due to variability among ovens and the 
other part is the variability among cupcakes that also includes batch to batch variability. 
The strip-plot design has three error terms where the variance in the study is split into 
variability among ovens, variability among batches, and variability among cupcakes within 
a batch. Using the more complex design structures provides fewer degrees of freedom for 
the cupcake error term, but that error term is refined down to the cupcake to cupcake vari¬ 
ability in the strip-plot design structure, where it involves oven and batch variability in the 
completely randomized and randomized complete block and batch variability for the split- 
plot. Therefore, the fact that there are fewer degrees of freedom does not necessarily mean 
that there is less power for the comparisons among the treatment factors as the magnitude 
of the important variance components can also decrease, providing an increase in power. 
The more complex designs have error terms with fewer sources of variability than do the 
error terms of the simpler design structures. A summary of the sources of variability asso¬ 
ciated with each of the error terms for the cupcake designs is given in Table 5.11. 


TABLE 5.11 


Sources of Variability Attributed to Each of the Error Terms for the Various Designs 
Associated with the Cupcake Examples 


Design Structure 

Error Term 

Source of Variance 

Completely randomized 

Error 

Day, batch, oven, cupcake 

Randomized complete block 

Error 

Batch, oven, cupcake 

Split-plot CR whole-plot design structure 

Error (oven) 

Day, oven 


Error (cupcake) 

Batch, cupcake 

Split-plot RCB whole-plot design structure 

Day 

Day 


Error (oven) 

Oven 


Error (cupcake) 

Batch, cupcake 

Strip-plot design structure 

Rectangle or day 

Day 


Error (oven) 

Oven 


Error (batch) 

Batch 


Error (cupcake) 

Cupcake 
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The following sections provide examples of several multilevel designs, each of which 
can be expressed as one or a combination of several of the four basic design structures 
described above. Being able to identify the components of a design that correspond to one 
of the four basic design structures provides a method for determining the source of error 
for each size of the experimental units in the study. It is important to be able to describe all 
of the error terms in a model as those descriptions are needed when using software to 
extract an appropriate analysis. When there are repeated measures or unequal variances 
at the residual or smallest size of experimental unit level of a model, the expression for the 
residual sum of squares is needed in order for the software to model the appropriate 
variances and covariances. The modeling of repeated measures experiments will require 
knowing how to compute the residual sum of squares. Most authors compute the residual 
sum of squares by subtraction, but the subtraction method is not sufficient when one needs 
to model the residual variances and covariances. 

Multilevel designs have two important characteristics. First, the treatment structure 
consists of at least a two-way set of treatment combinations. The second characteristic that 
distinguishes the multilevel designs from those in Chapter 4 is that more than one size of 
experimental unit is used in an experiment. There is one size of experimental unit for each 
level in the design structure. Each size of experimental unit has its own design and treat¬ 
ment structures and the model can be constructed by combining the models from each 
size of experimental unit. Since there is more than one size of experimental unit, there is 
more than one error term used in the analysis; that is, there is one error term for each size 
of experimental unit in the experiment, which is also reflected in the model. 

This chapter presents several examples to demonstrate the principles needed to use the 
four basic design structures discussed in Section 5.2 to properly identify the designed 
experiment employed in a study. Once the experimenter is able to use these principles to 
identify the designed experiments discussed in this chapter, she will be able to use them 
to identify the characteristics of other designs. 

Multilevel designs can be structured in many different ways. The next series of exam¬ 
ples is used to demonstrate the process of identifying the different sizes of experimental 
units and then this information is used to construct an appropriate model on which to 
base the analysis. Each example includes an analysis of variance table that lists the sources 
of variation, the degrees of freedom, and expected mean squares (see Chapter 18 for a 
discussion on computing expected mean squares). The design structures are related to 
the four basic design structures so that the form of the error terms can be determined. 
It is important to list the appropriate sources of variation and the corresponding degrees 
of freedom for an analysis before subjecting the data to computer analysis, as it provides 
an excellent check on whether appropriate model specification code was used to describe 
the data. 


5.2 Hierarchical Design: A Multilevel Design Structure 

Hierarchical designs are often used in the social sciences where groups of individuals 
form the larger size of experimental unit and the individuals within the group are the 
smaller size of experimental unit. For example, a study to evaluate methods of teaching 
mathematics to fifth graders involved selecting six classes of fifth graders from a school 
system and randomly assigning each of two methods to three of the classes. The classes 
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TABLE 5.12 


Analysis of Variance Table for the Teaching Method Study Using a 
Hierarchical or Split-Plot Design Structure 


Source 

df 

EMS 

Method 

1 

<T c + fc 2 (T da 

Classes (method) = error (class) 

4 

a2 e + k K^s 

Sex 

1 

<J 2 e+<t > 2 03) 

Sex x method 

1 

(J 2 s +(t) 2 {Tp) 

Error (student) 

8 



are the experimental units for teaching methods. It is of interest to determine if the 
teaching methods have different effects on male and female students. The student is the 
experimental unit for sex of student. The individuals are nested within sex of student 
within a class and the classes are nested within teaching method. This study involves a 
nested design structure and a two-way treatment structure with two teaching methods by 
two sexes of students. A model that can be used to describe a student's score on a math test 
after being taught by one of the teaching methods is 

Uijkm — P T + C/ "f ftk T ('I'Pljk ^ijkmr 1 ~ ^/ — b T 3, k 1 , 2 , m — 1, 2 , ... , 11 

where y ijkm is the score from the mth student of the /cth sex in the /th class taught by the zth 
method, p denotes the mean score, r, denotes the teaching method effect, c, ( denotes the 
effect of the /th class taught by the zth method, /3 k denotes the kth sex effect, (r/3)denotes 
the teaching method by sex interaction, and £ ijkm denotes the student effect within a sex of 
a class room taught by a teaching method. The analysis of variance table for the above 
model is given in Table 5.12 where the classes and the students are assumed to be random 
effects (see Chapter 18). The coefficients /c, and k 2 depend on the numbers of students of 
each sex within the class rooms (see Chapter 18 for the evaluation of expected mean 
squares). This model has two error terms, one for classes and one for students, and this is 
similar to the structure in the split-plot design structure. Hence, the hierarchical design is 
identical to the split-plot basic design structure. 


5.3 Split-Plot Design Structures: Two-Level Design Structures 

Split-plot designs are used mainly in agricultural, industrial, and biological research, but 
they can also be used effectively in most other areas of research. A split-plot design 
structure is a multilevel design with two or more levels. Split-plot designs involve two- or 
higher-way treatment structures with an incomplete block design structure and at least 
two different sizes of experimental units. The feature that distinguishes split-plot designs 
from repeated measures designs is that the levels of each of the factors in the treatment 
structure can be randomly applied to the various sizes of experimental units. In contrast, 
repeated measures designs involve a step where the levels of at least one of the factors in 
the treatment structure (usually time) cannot be assigned at random to the respective 
experimental units. The following examples demonstrate the uses of split-plot designs and 
to provide guides for identifying the appropriate design. 
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5.3.1 Example 5.1: Cooking Beans—The Simplest Split-Plot or 
Two-Level Design Structure 

An experimenter wants to study how five varieties of beans respond to three cooking 
methods. The dependent variables of interest are the tenderness and flavor of the beans 
after being cooked. The experimenter has a field consisting of 15 homogeneous rows. He 
randomly assigns each one of the five varieties of the one-way treatment structure to three 
rows, thus generating a one-way treatment structure in a completely randomized design 
structure. The varieties are assigned to the rows, as shown in Figure 5.10; hence, the rows 
are the experimental units associated with the varieties. At harvest time, the beans from 
each row are put into a box. For some measurement made on a row of beans (or a box), the 
model for the row experimental unit is 

Vi] = Vi + by i = 1 2,3,4,5, = 1,2,3 

where p, represents the mean of the zth variety and r l; denotes the error associated with 
the zth variety being assigned to the /1h row. An analysis or variance table for the row 
model used to compare the mean response of the varieties is given in Table 5.13. There are 
three rows assigned to each variety, thus, the variability among rows treated alike within 
a variety provides two degrees of freedom for error (row). If the variances of the rows 
within the five varieties are equal, then pool the variances together to provide the 
10 degrees of freedom associated with error (row). The row design is a completely rando¬ 
mized design structure. 

Next, the experimenter wants to examine the cooking methods. There are several pos¬ 
sible ways to carry out this part of the experiment, of which two are discussed. First, the 
experimenter could assign a different cooking method to each of the three rows planted 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Rows 


FIGURE5.10 Randomization scheme for assigning the varieties to the rows for the cooking beans experiment. 
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TABLE 5.13 


Analysis of Variance Table for the Row Analysis to Compare 
Varieties for the Cooking Beans Example 


Source of Variation 

df 

EMS 

Variety 

4 

o£w + 0 2 (/t) 

Error (row) 

10 

<7 2 * 
w row 


with a given variety, as shown in Figure 5.11. The arrangement in Figure 5.11 produces a 
two-way treatment structure in a completely randomized design structure where the rows 
are the experimental units. However, there is only one replication or row for each variety 
by cooking method combination. Hence there are no rows treated alike, which means 
there is no measure of the experimental error or row variance. The resulting analysis of 
variance table is shown in Table 5.14. 

A design with zero degrees of freedom for error is not very desirable (although some 
analyses can be done using the two-way non-replicated experiment techniques discussed 
in Milliken and Johnson, 1989; Milliken and Graybill, 1970; and Johnson and Graybill, 
1972). Another way of assigning the cooking methods avoids the zero degrees of freedom 
problem, but does make the experimental design and analysis more complex. 

The alternative method is to split each box of beans (one box is obtained from each row) 
into three batches and then randomly assign each of the cooking methods to one of the 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 


Cl = Cooking method 1 Rows 

C2 = Cooking method 2 
C3 = Cooking method 3 

FIGURE 5.11 Randomization scheme of assigning cooking methods to rows within each variety for the cook¬ 
ing beans example. 
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TABLE 5.14 


Analysis of Variance Table for the Row Analysis to Compare 
Varieties by Cooking Methods for the Cooking Beans Example 


Source of Variation 

df 

EMS 

Variety 

4 

+ <f (variety) 

Cooking 

2 

+ f (cook) 

Variety x cooking method 

8 

crjjw + ^ (variety x cook) 

Error (row) 

0 

0 


three batches within a row. Since a cooking method is assigned to a batch, the experi¬ 
mental unit for the cooking method is a batch. Thus, there are two sizes of experimental 
units for this experiment; the row (large size) is the experimental unit for varieties and the 
batch (smaller size) is the experimental unit for cooking treatments. Such an assignment is 
displayed in Figure 5.12. 

The treatment and design structures for the batch experimental units are a one-way 
treatment structure in a randomized complete block design structure where the rows (or 
boxes) are the blocks. The analysis of variance table for the batch part of the design is 
given in Table 5.15. 

The sum of squares for rows consists of the sum of squares for variety plus the sum of 
squares for error (row) from Table 5.13. If all rows were treated alike, the cooking method 
by row interaction would provide the batch error term, but some rows are planted to 


Row—experimental 



FIGURE 5.12 Diagram showing randomization process of assigning cooking methods to batches (lines are 
shown for rows 1 and 15 only). 
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TABLE 5.15 


Analysis of Variance Table for the Batch Analysis, Ignoring Varieties, for 
the Cooking Beans Example 


Source of Variation 

df 

EMS 

Rows 

14 

, rtfji , ? 

U £ T JU row T . 

Cooking methods 

2 

<jf + 0 s (cooking method) 

Residual = rows x cooking method 

28 

<t| + ? 


variety 1, others to variety 2, and so on. Thus the cooking method by row interaction 
includes the cooking method by variety interaction. So, consider only those rows planted to 
variety 1.Therowbycookingmethodinteractionwithinvarietylprovides(3 - 1) x (3 - 1) = 4 
degrees of freedom for measuring how batches vary when treated alike within a row. The 
batch error sum of squares is obtained by pooling the row by cooking method interaction 
sum of squares within a variety across the five varieties yielding 20 degrees of freedom for 
error (batch). A model to describe data from the cooking beans example is 

Vijk ~ Pik T Tj 7 + £ijic i — 1 , 2 ,3,4,5, j — 1 , 2 ,3, k — 1 , 2 ,3 

where p lk denotes the mean of the ith variety cooked by the kth method, r, ; is the random 
effect of the jth row assigned to the zth variety that is assumed to be distributed as 
N( 0, <7 r OW ), and e ijk denotes the random effect of the batch from the / 1 h row of the ith variety 
cooked with the kth method that is assumed to be distributed as N( 0, of l;]tch ). It is also 
assumed that e ijk and r i; are independent random variables. 

The mean p tk can be expressed in an effects model as 


p lk = p+v, + m k + (v(o) lk 


where p denotes the overall mean, v, denotes the effect of the ith variety, co k denotes the 
effect of the kth cooking method, and ( vco) lk denotes the variety by cooking method inter¬ 
action. The above model can be expressed with terms representing the two sizes of experi¬ 
mental units as 


y ijk = p + v, + r l( [ row part of model 

+ (O k + ( vo)) ik + s ijk } batch part of model 

where the row part of the model is also the blocking structure for the batch part of the 
model. A split-plot design can be analyzed in two steps by carrying out the row analysis 
and then carrying out the batch part of the analysis. When the data set is balanced, identi¬ 
cal results will be obtained where one fits a single model to the data or where one carries 
out the two step analysis process. The analysis of variance table for the cooking beans 
example is given in Table 5.16, which partitions the analysis for each of the experimental 
unit sizes, the row size and the batch size. The row analysis is also the blocking structure 
for the batch part of the analysis, as indicated by the arrows. 

The whole-plot or row design structure for the cooking beans experiment is a completely 
randomized design structure; hence, this is called the simplest split-plot or two-level 
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TABLE 5.16 


Analysis of Variance Table for the Cooking Bean Experiment Showing the 
Analysis for Each Size of Experimental Unit 


Source 

df 

EMS 

Roto Analysis 



Variety 

4 

ff batch + 3^mw +fM 1 

Error (row) 

10 

^batch + ^Lv [ 


Batch Analysis 


J 


Row 

14 



Cooking method 

2 

tfbatch + f(®) 

Variety x cooking method 

8 

thatch +f(va>) 

Error (batch) 

20 

zy2 

u batch 


design structure. The usual whole-plot design structure generally involves a randomized 
complete block design structure as demonstrated by the next example. 


5.3.2 Example 5.2: Grinding Wheat—The Usual Split-Plot or 
Two-Level Design Structure 

A grain milling experiment consists of evaluating the properties of various varieties of 
wheat after the wheat kernels are milled or ground into wheat flour. The experiment 
consists of setting the gap between the grinding rollers (called roll gap) to a value and then 
grinding a batch of each of the varieties (in a random order). Next the gap between 
the grinding rollers is changed to another value and new batches are ground. Suppose the 
researcher wishes to evaluate three roll gaps and five varieties. Thus, one replication of the 
two-way treatment structure of three roll gaps by five varieties requires 15 runs of 
the flour mill. One replication or 15 runs can be accomplished during one work day, so four 
days are required to obtain four replications. The environmental conditions such as 
humidity can have an effect on the milling process and these conditions can be different 
from day to day. Thus, to help control these conditions, day is used as a blocking factor 
where one replication of the 15 treatment combinations is obtained during one day. The 
randomization process is to randomly assign the order of the three roll gaps to a roll gap 
run within each day, then, within each of the roll gaps, to randomly assign the order of the 
varieties to batches to be milled. Figure 5.13 contains the diagram displaying the randomi¬ 
zation process, where a day corresponds to a block, a group of five runs within a day is the 
whole-plot experimental unit, and a single run is the subplot experimental unit. Not all of 
the arrows showing the assignment of varieties to runs are shown as the complete set of 
lines would clutter the display. The whole-plot design structure is a randomized complete 
block with four blocks of size three. The whole-plot model can be expressed as 

i fij = H + Ri + dj + e tj , i = 1, 2,3, j = 1, 2,3, 4, where d } ~ i.i.d. N( 0, a^ ay ) and e tj ~ i.i.d. N( 0, cr r 2 un ) 

where R, denotes the roll gap effect, d f denotes the random day effect and e fj denotes the 
random roll gap run effect or whole-plot error. The whole-plot analysis of variance table is 
given in Table 5.17, where the run or whole plot error is computed from the day by roll gap 
interaction; that is, the error for a randomized complete block design structure is the treat¬ 
ment structure by design structure interaction. The next step in the analysis is to determine 
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the source of the batch error. This is accomplished by considering only those runs with 
one roll gap, say 1.5 mm. The analysis of variance table to compare the varieties at a roll gap 
of 1.5 mm is displayed in Table 5.18. The error term is computed by the variety by day (or 
run within a day since only one run is used for a given day) interaction, which provides 
12 degrees of freedom to measure how batches treated alike vary within a run. This process 
is carried out for the other two roll gap settings, each providing 12 degrees of freedom for 
batch error. If these three variances are equal, then they can be pooled into the batch error 
term with 36 degrees of freedom. The batch error term can be expressed as variety x day 
(roll gap) or is the variety by day interaction pooled across the levels of roll gap. 

A model that includes the run and batch parts of the model can be expressed as 

y,jk = P + Rj + dj + e ij + Vk + (RV)ik + £ i jk> i = 1 / 2 ,3, / = 1 , 2 ,3,4, k = 1,2,...,5, 

where 

dj - i.i.d. N( 0, a 2 day ), e tj ~ i.i.d. N( 0, o 2 mn ), and e ijk ~ i.i.d. N( 0, <7(; atch ) 

In the above model R, denotes the zth roll gap effect, V t denotes the /cth variety effect, (RV) ik 
denotes the roll gap by variety interaction and e ijk denotes the random batch effect. 
p + R, + dj + e t j is the whole plot or run part of the model and it is also the blocking structure 
for the subplot or batch part of the model. The batch part of the model is V k + (RV) ik + e ijk . 


TABLE 5.17 


Whole-Plot Analysis for the Flour Milling Experiment 


Source 

df 

EMS 

Day or block 

3 

°£n +3 °*y 

Roll gap 

2 


Error (run) = day x roll gap 

6 
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TABLE 5.18 


Analysis of Variance for Comparing Varieties at Roll 
Gap of 1.5 mm for the Flour Milling Experiment 


Source 

df 

EMS 

Day or block 

3 

2 2* 
^batch ^^run 

Variety 

4 

<rLtch+ f(V) 

Error (batch) = day x variety 

12 

2 

^batch 


TABLE 5.19 


Analysis of Variance Table for the Flour Grinding Experiment Showing the 
Analysis for Each Size of Experimental Unit 

Source df EMS 


Run Analysis 


Day or block 

3 

°b*ch + 5<TL,+ 15<TL, 

Roll gap 

2 

^Lch+S^un + ffR) 

Error (run) = day x roll gap 

Batch Analysis 

6 

CT batch + 5<TL, 

Run 

11 


Variety 

4 

< t ch+fdO 

Variety x roll gap 

8 

oLa+IHYR) 

Error (batch) = variety x day (roll gap) 

36 

ST 2 

u batch 


The final analysis of variance table that includes both the run and batch analyses is dis¬ 
played in Table 5.19. The brackets and arrows indicate that the run or whole plot part of the 
model is the blocking structure for the batch part of the model. As indicated previously, a 
split-plot design with a randomized complete block whole-plot design structure is the 
usual split-plot design. 


5.3.3 Example 5.3: Baking Bread—Split-Plot with Incomplete 
Block Design Structure 

A bakery scientist designed a study to evaluate the effect of temperature on the volume of 
loaves of bread made from two different recipes and baked at three different temperatures. 
Figure 5.14 displays the randomization process for this experiment. Only two ovens were 
available on a given day and they could be used just once during that day. Each oven can 
hold one loaf of bread from each recipe, but only two of the three temperatures could be 
observed on a given day. The researcher wanted to have four replications of each tempe¬ 
rature, meaning that the study needed to be conducted on six different days. Table 5.20 
contains the assignment of temperatures to the six days. Using Figure 5.14, the randomiza¬ 
tion of temperatures to ovens is displayed where day is the block and the oven is the 
experimental unit for the levels of temperature. The oven design is a one-way treatment 
structure (levels of temperature) in an incomplete block design structure. A model that 
could be used to describe data from each oven is 
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Temperatures 



FIGURE 5.14 Assignments of temperatures to ovens within a day and recipes to loaves within an oven using 
incomplete block whole-plot design structure. 


where 


dj ~ i.i.d. N(0, cr 2 da y ) and o iy ~ i.i.d. N(0, c^ ven ) 

The subscripts ( i, j) belong to the index set detailed above and the analysis of variance 
table is given in Table 5.21. The expected mean squares are computed for the type III sums 
of squares (see Chapter 10). The oven error term is computed from the design structure 
by treatment structure interaction or the day by temperature interaction. If all cells in 
Table 5.20 were observed, there would be (3 - 1)(6 - 1) = 10 degrees of freedom for the 
oven error, but six of the cells are empty leaving 10 - 6 = 4 degrees of freedom for oven 
error. The loaf design is a one-way treatment structure (levels of recipes) in a randomized 
complete block design structure where the ovens are the blocks. Not all blocks are treated 
alike, so consider those ovens with a common temperature. The part of the treatment 
structure for the temperature 160°C is displayed in Table 5.22. The design in Table 5.22 is 


TABLE 5.20 


Assignment of Temperatures to Days for Incomplete 
Block Whole-Plot Design Structure 


Days 


Temperatures 


1 

160°C 

175°C 

X 

2 

160°C 

X 

190°C 

3 

X 

175°C 

190°C 

4 

160°C 

X 

190°C 

5 

X 

175°C 

190°C 

6 

160°C 

175°C 

X 


Note: X denotes temperature was not observed during that 
day. 
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TABLE 5.21 


Analysis of Variance Table for the Oven Analysis, Ignoring 
Recipes, for the Incomplete Block Design Structure with 
EMS Computed Using Type III Sums of Squares 


Source of Variation 

df 

EMS 

Days 

5 

^oven "I” 1*8O'Day 

Temperatures 

2 

Cloven + 0h T ) 

Error (ovens) = days x temperatue 

4 

2* 

& oven 


TABLE 5.22 


Design for Recipes Baked with Temperature 160°C 


Days 


Recipes 


1 

1 


2 

2 

1 


2 

4 

1 


2 

5 

1 


2 


TABLE 5.23 


Analysis of Variance Table for the Loaf Analysis, Ignoring 
Temperatures for the Incomplete Block Design Structure 


Source of Variation 

df 

EMS 

Days 

3 

°Toaf + 2<T3ven 

Recipes 

1 

<af +0 2 ( R ) 

Error (loaves) = days x recipes 

3 

<Voaf 


a one-way treatment structure (two recipes) in a randomized complete block design struc¬ 
ture with four blocks or ovens or days. There are three degrees of freedom for the loaf 
error term from the 160°C temperature data, which corresponds to the recipe by day 
interaction. The analysis of variance table for the loaf analysis at 160°C is in Table 5.23. 
The loaf error can be computed for each of the temperatures and, if the variances are 
equal, the three variances are pooled to provide nine degrees of freedom for error (loaf). 
A model that can be used to represent all of the data is 

Vijk ~I J - + Ti + dj + o ij + R k + ( TR) lk + e ijk 

(i,j) e {(hi), (2,1), (1,2), (3,2), (2,3), (3,3), (1,4), (3,4), (2,5), (3,5), (1,6), (2,6)}, k = 1,2 
where 


dj ~ N( 0, o 2 day ), Ojj ~ i.i.d. N( 0, O and e ijk ~ i.i.d. N{ 0, of oaf ) 

The final analysis of variance table for this model is in Table 5.24, where the expected mean 
squares are computed from the type III sums of squares. The type III sums of squares are 
used because of the incomplete block whole plot design structure. The oven part of the 
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TABLE 5.24 

Complete Analysis of Variance Table for the Bread Baking Study with the 
Incomplete Block Whole-Plot Design Structure with EMS Computed for Type 
III Sums of Squares (See Chapter 10) 


Source of Variation 

df 

EMS 

Days 

5 

<af + 2< ^ve„ + 3.6<7^ y 

Temperatures 

2 

<7 Lf + 2c7 oven+f(T) 

Error (ovens) 

4 

‘hoaf + 2 ® oven 

Recipes 

1 

°Lf+f( R ) 

Temperatures x recipes 

2 


Error (loaves) = days x recipes (temperatures) 

9 

<af 


model is p + T, + d + o,y, which is also the blocking structure for the loaf part of the model. 
The loaf part of the model is R k + (TR) ik + e ijk . This example demonstrates that any type 
of design structure can be used at each of the levels or sizes of experimental units. One of 
the exercises involves an incomplete block subplot or smallest size of experimental unit 
design structure. 


5.3.4 Example 5.4: Meat in Display Case—A Complex Split-Plot or 
Four-Level Design 

A meat scientist wants to study the effects of temperature (T) with three levels, types of 
packaging (P) with two levels, types of lighting (L) with four levels, and intensity of light 
(I) with four levels on the color of meat stored in a meat cooler for seven days. Six coolers 
are available for the experiment and the three temperatures (1,3, and 5°C) are each assigned 
at random to two coolers, as shown in Figure 5.15. 

Each cooler is partitioned into 16 compartments on a 4 x 4 grid (Figure 5.16). The light 
intensities are regulated by their distance above the cooler surface, thus, all partitions in a 
column are assigned the same light intensity. The four types of light are randomly assigned 
to the four partitions within each column. Finally, the two types of packaging are assigned 
to the steaks and both types of packaging are put into each partition. Figure 5.16 shows 
how one such cooler is arranged. 


Temperatures 



Coolers 


FIGURE 5.15 


Assignments of temperatures to coolers for meat in display case study. 
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100 40 120 60 


im 

FIGURE 5.16 Assignments of intensities to columns, types of lighting to partitions, and types of packaging to 
half-partitions in each cooler for the meat in display case study. 


One must first correctly identify the different sizes of experimental units or levels of the 
experiment before an appropriate analysis can be constructed. The experimental units 
for the levels of temperature are the coolers. The cooler design is a one-way treatment 
structure (levels of T or temperature) in a completely randomized design structure. If one 
measurement is made on each cooler, the response could be modeled by 

y,j = p+ T, + c {j , i = 1,2,3, j = 1,2, and c rj ~ i.i.d. N( 0, cooler) 

The analysis of variance table for the cooler model is given in Table 5.25 where the cooler 
error term is computed from the variation of the two coolers within a temperature. 


TABLE 5.25 


Analysis of Variance Table for the Cooler 
Experimental Unit Part of the Model 


Source 

df 

Temperature 

2 

Error (cooler) 

3 
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Temperature level 1 
Cooler 1 Cooler 2 



Error (column, T{) = intensity x cooler 
Error (column) = intensity x cooler (temperature) with 9 df 

FIGURE 5.17 Comparisons of levels of intensity at temperature 1 to start column analysis for the meat in 
display case study. 


pooled across the three temperatures. Thus there are three degrees of freedom for Error 
(cooler). 

The experimental units for the levels of intensity are the columns of four partitions in 
a cooler. The column design consists of one-way treatment structure (levels of I or inten¬ 
sity) in a randomized complete block design structure with six blocks or coolers. If all 
coolers were treated alike, the column error term would be computed from the intensity 
by cooler interaction. But there are three different temperatures, so restrict the analysis 
to the two coolers assigned to 1°C. The two coolers assigned to 1°C are displayed in 
Figure 5.17. At this point, the design is a one-way treatment structure (four levels of I) in 
a randomized complete block design structure (two coolers). If a measurement is made 
on each of the columns of these two coolers, a model that can be used to describe the 
response is 

Vijk = M + 4 + Ci; + d ljkr ;' = 1,2, k~ 1,2,3,4, c y ~ i.i.d. N( 0, cr 2 ooler ) and d ljm ~ i.i.d. N( 0, a 2 olumn ) 

The analysis of variance table for the column model is shown in Figure 5.17, where the 
column error term is computed as the intensity by cooler interaction, providing three 
degrees of freedom. This process is repeated for the other two temperatures, each provi¬ 
ding column error terms with three degrees of freedom. If these three variances are equal, 
they can be pooled into the error (column) with nine degrees of freedom. The error 
(column) can be represented by intensity by cooler (temperature), read as intensity by 
cooler interaction pooled across the levels of temperature. 

The experimental units for the levels of lighting type are the partitions of a column. The 
partition design is a one-way treatment structure (levels of L) in a randomized complete 
block design structure with 24 blocks, consisting of the four columns from the six coolers. 
If all of the columns were treated alike, the partition error term would be computed as the 
type of lighting by column interaction. But all columns are not treated alike as there are 
three temperatures by four levels of intensity. Restrict the structure to involve only those 
columns from temperature 1°C and intensity I v as shown in Figure 5.18. The design associ¬ 
ated with Figure 5.18 is a one-way treatment structure (four levels of type of lighting) in 
a randomized complete block design structure (two columns, but each column is from a 
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Temperature 1, intensity 1 


Cooler 1 Cooler 2 


£3 


^2 

£4 


£1 

£1 


^3 

^2 


£4 


Partition design is one-way 
treatment structure in an RCB 
design structure 


Source 

dj 

Cooler 

1 

Light 

3 

Error (partition) 

3 


Error (partition, T v / 1 ) = light x cooler 


Error (partition) = light x cooler (temperature intensity) with 36 df 

FIGURE5.18 Comparison of the levels of lighting at temperature 1 and intensity 1 to start the partition analysis 
for the meat in display case study 


different cooler). If one measurement is made on each partition in Figure 5.18, a model that 
could be used to describe the responses is 

Vijim = 1 U + K + d*iji + Pijimr i = 1 2, rn = 1,2,3,4, df ~ i.i.d. N( 0, (7^ lumn ) and 
Pljlm ~ i-i-d- N( 0, <^p ar tit ion ) 

where df denotes the combination of cooler and column of that cooler effect. The analysis 
of variance table for this partition model is shown in Figure 5.18, where the partition error 
term is computed as the lighting by cooler interaction providing three degrees of freedom. 
This process needs to be carried out for the 12 combinations of temperature by intensity, 
each providing three degrees of freedom for error (partition). If these 12 variances are 
equal, they can be pooled into one term providing the error (partition) with 36 degrees 
of freedom. The error (partition) term can be represented as light x intensity x cooler 
(temperature). 

Finally, the experimental units for the levels of packaging are the half-partitions 
(or steaks). The half-partition design is a one-way treatment structure (levels of packaging 
or P) in a randomized complete block design structure with 96 blocks, consisting of the 
four partitions within each of the four columns from the six coolers. If all partitions were 
treated alike, the half-partition error term would be computed by the packaging by parti¬ 
tion interaction. But all partitions are not treated alike as they are assigned to three levels 
of temperature, four levels of intensity and four levels of lighting. Select those partitions 
assigned to temperature 1°C, intensity l u and lighting L y as shown in Figure 5.19. If one 
measurement is made on each steak or half-partition in Figure 5.19, a model that can be 
used to describe the response is 

Vijiin = P + p n + P\jim + e ljiin, j = l 2 , n = 1 2, p* jlm ~ i.i.d. N( 0, <7p artition ) and 

%1» ~ UA N ( 0 ' ^partition) 
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Temperature 1, intensity 1 
and lighting 1 

Cooley Cooler 2 


Pi 


Pi 

Pi 


Pi 


Vi Partition design is one-way treatment 
structure in an RCB design structure 


Source 

df 

Cooler 

i 

Packaging 

i 

Error (Vi partition) 

i 


Error (Vi partition, 7 T 1 , = packing x cooler 


Error (Vi partition) = packing x cooler (temperature, intensity, lighting) with 48 df 

FIGURE 5.19 Comparisons of the levels of packaging at temperature 1, intensity 1, and lighting 1 to start the 
half-partition analysis for the meat in display case study. 


The analysis of variance table for this half-partition model is shown in Figure 5.19, where 
the half-partition error term is computed as the packaging by cooler interaction, providing 
one degree of freedom. This process needs to be carried out for the 48 combinations of 
temperature, intensity and lighting. These 48 sums of squares are pooled together to 
provide 48 degrees of freedom for error (half-partition). The error (half-partition) term can 
be represented as packaging x light x intensity x cooler (temperature). 

The above discussion provides models for each of the four levels in the study, the cooler 
model, the column model, the partition model, and the half-partition model. These models 
can be combined into a single model where the interactions between the factors are added 
together. Since the basic design structure is a split-plot at each level, the treatment structures 
at each level and interaction with the treatment structure from the levels above it are 
included in that level part of the model. A model that can be used to describe the data is 


}lijbnn-P + Ti +C ij 

+ I k +(TI)ik+d ijk 

+ L m+ (TL) im+ (IL) km+ (TIL) ikm+Pijkm 
+ P n+ (TP) in+ (IP) kn +(TIP) ikn+ (LP) m 
+ (TLP) imn+ (ILP) kmn+ (TILP) tkmtI + £ ijkl 


{cooler part of the model 
{column part of the model 
{partition part of the model 

| {half-partition part of the model 


i=l,2,3, ;'=1,2, k= 1,2, 3, 4, m= 1, 2, 3, 4, n=l,2, 

Cjj ~ i-i.d. N(0, <7 coo i er ), djj k ~i.i.d. Af(0, CF co i unm ), p tjkm ~i.i.d. N(0, <7p artition ) 
and £ ijkmn ~U.d. N{ 0, (jf partiti J 


The analysis of variance table for the above model is given in Table 5.26 where the larger 
experimental unit analysis is also the blocking structure for the next smaller size of 
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TABLE 5.26 

Complete Analysis of Variance Table for the Complex Split-Plot Design for Meat in Display Case 


Source 


df 


EMS 


Cooler Analysis 


Temperature (T) 

2 

^partition + partition + Column + ^looler + fOO 

Error (cooler) 

3 

partition ^ ^partition column cooler 

Column Analysis 

Block 

5 


Intensity (f) 

3 

CT fpartition + 2<T partition + 8ff? olunm + f(I) 

Txl 

6 

<T |pa rt i l i„„ + ^partition + 8ff? okmm + f(Tl) 

Error (column) 

9 

°fparation + 2<T paitition + 8<Tj ollmm 

Partition Analysis 

Block 

23 


Lighting (L) 

3 

ff |pa tH a„n +2<T P^O n + fiD 

TxL 

6 

CT tpartition + 2<T partition + <f(TL) 

IxL 

9 

^paaiHon+^P^on + WD 

TxIxL 

18 

<7? + 2<r;L rtltlon + <f(TIL) 

^partition partition r \ / 

Error (partition) 

36 

^partition" 1 " 2<T partition 

Half-partition Analysis 

Blocks 

95 


Packaging (P) 

1 

o\ + f(P) 

^partition ~ ' ' 

TxP 

2 

a\ + f(TP) 

^partition ’ ' ' 

IxP 

3 

(7? + <P(1P) 

^partition ~ ' ' 

TxIxP 

6 

a? + mrip) 

^partition r v ’ 

LxP 

3 

<7? + MLP) 

^partition ~ ' ' 

TxLxP 

6 

<7? + <p(TLP) 

^partition ~ v ' 

IxLxP 

9 

<7? + mLP) 

^partition ~ ' ' 

TxIxLxP 

18 

a\ + f(TILP) 

^partition ~ ' ' 

Error (—partition) 

48 

o\ 

^partition 


Note: The brackets and arrows indicate which effects form blocks for the next smaller size of experimental 
unit. 


experimental unit. This is a model for a split-split-split-plot experiment consisting of four 
levels or sizes of experimental units and involves four error terms, one for each level 
or size of experimental unit. This is also a hierarchical design structure where the half¬ 
partitions are nested within the partitions, which are nested within the columns, which 
are nested within the coolers. 



Multilevel Designs: Split-Plots, Strip-Plots, Repeated Measures and Combinations 


131 


5.4 Strip-Plot Design Structures—A Nonhierarchical Multilevel Design 

The process of constructing a strip-plot design structure is to arrange the experimental 
units into rectangles. The levels of one set of factors are randomly assigned to the rows of 
each rectangle and the levels of the other set of factors are randomly assigned to the 
columns of each rectangle. Thus, the rows are experimental units associated with the first 
set of factors and the columns are the experimental units associated with the second set of 
factors. But as a consequence, the cell or intersection of a row and a column is the experi¬ 
mental unit associated with the interaction comparisons between the two sets of factors. 
An example is used to demonstrate some of the uses of the strip-plot design structure. 

5.4.1 Example 5.5: Making Cheese 

A dairy scientist designed a cheese manufacturing study that involved two levels of fat, 
three types of cheese, two storage temperatures, and two levels of storage humidity. The 
experiment is a two-step process where the first step involves making batches of cheese 
with each type of cheese using the two levels of fat. Four one-pound packages of cheese 
are made from each of the batches. The second step is to store the cheese in a set of envi¬ 
ronmental conditions to allow it to cure for four weeks. The storage part of the study 
involves putting one package of cheese from each batch into a chamber assigned one of the 
storage temperatures and one of the levels of humidity. This process is equivalent to 
arranging 24 packages of cheese into a rectangle with four rows and six columns. The 
column corresponds to a batch of cheese and the row corresponds to an environmental 
chamber, as shown in Figure 5.20. The dairy scientist has four chambers available for use 


Cheddar 

Cheddar 

Cheddar med, 

Cheddar med, 

Cheddar mild, 

Cheddar mild, 

sharp, 2% fat 

sharp, 4% fat 

2% fat 

4% fat 

2% fat 

4% fat 



One pound block 


FIGURE 5.20 Randomization scheme for the cheese-making experiment using a strip-plot design structure. 
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FIGURE 5.21 Randomization scheme for the chamber design for the cheese-making experiment. 


in the study, but he wishes to have four replications. Thus, the four replications are obtained 
by carrying out the experiment in four different months. A model and analysis can be 
constructed by evaluating the treatment and design structures for each size of experimen¬ 
tal unit. The chamber design is displayed in Figure 5.21, where only the temperature and 
humidity levels are shown for each of the four months. The chamber design is a two-way 
treatment structure (levels of temperature by levels of humidity) in a randomized com¬ 
plete block design structure where each month is a block. A model that can be used to 
describe a measurement made on each chamber is 


y,jk -1u+Tj + Hk + ( TH)jk + m , + Cjj k , i - 1 , 2 , 3 , 4 , j — 1,2, k-1,2 

m, ~ Lid. N( 0, a 2 month ), and c ijk ~ Lid. N( 0, < amber ) 

Table 5.27 contains the analysis of variance table for the chamber model where the cham¬ 
ber error term is the treatment structure by design structure interaction. There are four 
treatment combinations in the treatment structure and four blocks in the design structure; 
thus the chamber error term is based on nine degrees of freedom. The error (chamber) 
term is the sum of the temperature by month interaction, humidity by month interaction, 
and the temperature by humidity by month interaction terms. If one uses the three-way 
interaction term in the SAS®-Mixed procedure, the three interaction terms will be pooled 
together into one error term. The variance component for chamber is denoted with an 
asterisk since it also includes the variation due to blocks of cheese. The variance compo¬ 
nent for month is denoted with an asterisk since it also includes the variation due to batches 
and blocks of cheese. 
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TABLE 5.27 


Analysis of Variance for the Chamber Experimental 
Unit Design for the Making Cheese Experiment 


Source 

df 

EMS 

Month 

3 

rj2* i A(j2* 

^ chamber “ w month 

Temperature 

1 


Humidity 

1 

^L mb er + f(H) 

Temperature x humidity 

1 


Error (chamber) 

9 

fj2* 

^ chamber 


The batch design is displayed in Figure 5.22, where only the cheese type and percentage 
fat are considered with the four months. The batch design is a two-way treatment struc¬ 
ture (levels of types of cheese by levels of fat) in a randomized complete block design 
structure where each month is a block. A model that can be used to describe a measurement 
made on each batch is 

1 limn = P + F m + C „ + ( F Q,nn + + Knn> 1 = 1,2,3, 4, Ul = 1,2, « = 1,2,3, 

m < ~ UA MO, < onth ) and b mm ~ i.i.d. N( 0, < tch ) 

Table 5.28 contains the analysis of variance table for the chamber model where the cham¬ 
ber error term is the treatment structure by design structure interaction. There are six 
treatment combinations in the treatment structure and four blocks in the design structure, 
thus the batch error term is based on 15 degrees of freedom. The error (batch) term is the 
sum of the fat by month interaction, cheese type by month interaction, and the fat by 
cheese type by month interaction terms. If one uses the three-way interaction term in the 
SAS-Mixed procedure, the three interaction terms will be pooled together into one error 
term. The variance component for batch is denoted with an asterisk since it also includes 


Cheddar 

Cheddar sharp, 

Cheddar med, 

Cheddar med, 

Cheddar mild, 

Cheddar mild, 

sharp, 2% fat 

4% fat 

2% fat 

4% fat 

2% fat 

4% fat 



RANDOMIZE 


□□b 


□ 


Month 1 


pipipi pi 

p|l5| o| 5| 

la>!la>! !oi Iop! 

,p!p!|s!|cp! 

Month 2 


pipi pipi SP| pi 

Sloj &|a|a| o>| 

l3>|!ai| |o|!oj !cp| !s>| 

lolla! !a>|la! Ia| Ict! 

Month 3 


□□□□□ 


a 1 ! op, 

Ict! Ict| 
i 

ia> is 1 


CTiCTi O] CT 

lallol lap' la» 

i i i 

a|ia>| iQ 1 icp 
' 

Month 4 


FIGURE 5.22 Randomization scheme for the batch design for the cheese-making experiment. Some of the lines 
are not included to simplify the drawing. 
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TABLE 5.28 


Analysis of Variance for the Batch Experimental 
Unit Design for the Making Cheese Experiment 


Source 

df 

EMS 

Month 

3 

°batch + 6<7m on t h 

Fat 

1 

<atch+f(f) 

Cheese type 

2 

°b’atch+f(C) 

Fat x cheese type 

2 

< tch +f(FC) 

Error (batch) 

15 

^batch 


the variation due to blocks of cheese. The variance component for month is denoted with 
an asterisk since it also includes the variation due to chambers and blocks of cheese. 

The 1 pound cheese block is the experimental unit for interactions between the factors 
assigned to the chambers and factors assigned to the batches. The cheese block part of the 
model is 


(TF) jm+ (TC) /(1+ ( TFC) jmn+ ( HF) km+ (HC) h + (HFC) kmtl + (THFC) jkmn+ s ijkmn 

£ ijkmn ~ i-i.d.N( 0, <7 block ) 

The model for this strip-plot design is obtained by combining the models for the three 
experimental units into one model as 


y ijk =p + m. } blocking part of the model 

+ Tj+ H k +(TH) jk + c ijk } chamber part of the model 
+ F m + C„+ ( FC) mn + b imn } batch part of the model 
+ (TF) jm + f TC) jn + ( TFC) jmn + (HF) km ' 

+ (HC) kn + (HFC) kmn + (THFC) jkmn + £ ijkmn 


cheese block part of the model 


i =1,2,3,4, j =1,2, k =1,2, m =1,2, n =1,2,3 
m i ~i.i.d.N( 0, < onth ), c, jk -i.i.d. N( 0, cri amber ), b imn -i.i.d. N( 0, <7 batch ), 

and £ ijhnn ~ i-i.d.N( 0, a block ) 

The analysis of variance table for the cheese experiment is given in Table 5.29, where 
the rows are segregated by size of experimental unit. The cheese block error term is 
computed as the chamber treatment structure by batch treatment structure by design 
structure interaction providing (4 - 1)(6 - 1)(4 - 1) = 45 degrees of freedom. This design 
involves three sizes of experimental units, the chamber size, the batch size, and the block 
of cheese size. The blocks of cheese are nested within the batch and the blocks of cheese 
are nested within the chamber, but the batches are not nested within the chambers, nor are 
chambers nested within batches. Hence this multilevel design structure is not a hierarchi¬ 
cal design structure. 

This example demonstrates that the design structure associated with a given size of 
experimental unit can involve any needed treatment structure. The treatment structures 
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TABLE 5.29 


Complete Analysis of Variance Table for the Cheese Making Experiment 


Unit 

Source 

df 

EMS 

Blocking Structure 


Months 

3 

ffblock + 4<T batch + 6c4 a mber + 24< onth 

Chamber 


Temperature 

1 

<7block + 6<T?b amb a +f(T) 


Humidity 

1 

ff block + 6ff cha mb e r + ^( H ) 


Temperature x humidity 

1 

^iock + 6<4 amba + f(TH) 


Error (chamber) 

9 

°block + 6<y* amber 

Batch 


Fat 

1 

ffblock + 4 tf batch+f( f ) 


Cheese type 

2 

^block + 4 <tch+f(C) 


Fat x cheese type 

2 

^.ock+ 4 < te h+0 2 (rc) 


Error (batch) 

15 

<^l 0 ck +Match 

Block of cheese 


Temperature x fat 

1 

oSwc + <P(TF) 


Temperature x cheese 

2 

°block + <f(TC) 


Temperature x fat x cheese 

2 

°block + <P(TFC) 


Humidity x fat 

1 

^iock +mD 


Humidity x cheese 

2 

^.ock +mQ 


Humidity x fat x cheese 

2 

<^iock + f(HFC) 


Temperature x humidity x fat 

1 

O&o* +<P(,THF) 


Temperature x humidity x cheese 

2 

°block + <P(THC) 


Temperature x humidity x fat x cheese 

2 

^lock + P(THFC) 


Error (cheese block) 

45 

/y2 

u block 


for the chamber and batch experimental units are both two-way factorial arrangements. 
One could have a two-way factorial arrangement with one control as the treatment struc¬ 
ture for one of the experimental units. This process of identifying the experimental units 
and then specifying the design and treatment structures of each provides a general method 
of identifying an appropriate design and corresponding model for complex experiments. 


5.5 Repeated Measures Designs 

Repeated measures designs are used effectively in many areas of study. These designs 
involve a treatment structure with at least two factors, an incomplete block design structure 
and at least two sizes of experimental units. The repeated measures design has the same 
type of design structure as a split-plot design, but a repeated measures design differs from 
the split-plot type of design in that the levels of one or more factors cannot be randomly 
assigned to the corresponding experimental units. Most often, time is the factor where its 
levels cannot be randomly assigned to the time intervals of a subject. Repeated measures 
designs involving time are designs used for longitudinal studies. Thus, repeated measures 
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designs involve a step or steps where it is not possible to randomly assign the levels of 
some of the factors to their experimental units, whereas in split-plot type designs it is pos¬ 
sible to use randomization at each step. The following examples demonstrate some of the 
structures of repeated measures designs and provide a guide for proper identification of 
designed experiments. 


5.5.1 Example 5.6: Horse Feet—Basic Repeated Measures Design 

This particular experiment leads to the idea of a repeated measures design using two dif¬ 
ferent sizes of experimental units. However, if the experimenter is not careful, he or she 
may inadvertently miss the fact that there are two different sizes of experimental units, 
resulting in an incorrect analysis. 

A veterinarian has two techniques that can be used to fuse the joint of a horse's foot after 
it is broken and she wishes to determine if one technique is better than the other. The 
experiment consists of taking some horses, breaking a joint on each horse, repairing it with 
one of the two techniques, and determining the strength of the fused joint four months 
later. She also wants to determine if the same technique works equally well for front feet 
and back feet. Because horses are scarce and expensive, she plans to break the joint on a 
front foot, wait until it heals, and then break the joint on a rear foot or vice versa on each 
horse. This healing process also introduces a time factor into the design. Thus, the treat¬ 
ment structure is a 2 3 factorial arrangement generated from two fusion techniques (F), two 
positions (P), and two healing times (T). The design structure is an incomplete block design 
where each horse is a block and there are two observations per block (horse). Since the 
blocks are incomplete, some of the treatment structure information will be confounded 
with block or horse effects (Cochran and Cox, 1957). There are various ways of assigning 
the treatment combinations to the two feet of a horse. Suppose there are four horses. One 
process is to assign a fusion technique to each horse where both a front and a rear foot are 
treated with the same level of fusion, as shown in Table 5.30. 

Let y ijkm denote the observation obtained from ith fusion technique, / 1 h position, /cth time, 
and mth horse. The model used to describe the data is 

IIijkm ~ l^-ijk 3 'hi: + ^ijkm> * — ^' 2, / — 1/2, k — 1, 2, Ul — 1, 2, ... , 4 

where p ljk denotes the mean response for the zth fusion technique,/th position, /cth time, and 
h m denotes the effect of the mth horse and e ijkm denotes the response error of a measurement 
made on a foot of a horse. Two types of comparison can be made among the treatment 
combination means, the p ijk . One type of comparison is the intra-horse (or within-horse) 


TABLE 5.30 


First Assignment of Treatment Combinations for Horse Feet Experiment 


Horse 




1 

2 

3 

4 

WiTi 

FiPiT 2 

fyvri 

F 2 PJ 2 

Fi P 2 T 2 


F P T 
± 2 1 2 1 2 

F 2 P 2 Ti 


Note: The two fusing techniques are F 2 and F 2 , the two times are Tjand T 2 , and the 
two positions are P 1 and P 2 . 
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comparison and the second type of comparison is the inter-horse (or between-horse) 
comparison. The factorial effects can be defined as 

Mean = p... 

F = Pi- - Pi ■ ■, p = P i- - P- 2 -, T = P-i ~P -2 
FxP = p u . — Pi 2 .~ Pzi- + Pll- 
F x T = p M — p v2 — p 2 , j + p 2 , 2 
P x T — p. u — p _ l2 — p _ 2 j + p . 22 
F x P x T = p ni — p 112 — p l21 + P\22 ~ P211 P212 P221 — P222 


The best estimator of p ijk is y ijk .. The best estimator of the main effect P can be expressed as 

p = y.i..-y-2- 

1 

= jl(y 1111 ' 3/mi) + ( 3 / 1122 " 3 / 1212 ) + ( 3 / 2113 ' 3 / 2223 ) + ( 3 / 2124 3 / 2214 )] 


which is an intra-horse or within-horse comparison. The intra-horse comparison is easily 
justified by substituting the right-hand side of the above model for y ijkm in P, which gives 


P ^[(PlU + ^1 + £ 11 U P 122 \ £ 122 l) + (Pll 2 + ^2 + £ 1122 Pill ^2 £ 12 u) 

“t (Pn I ^3 ^2113 — P222 ~ ^3 — ^2223) (Pl\2 + ^4 ^2124 _ Pill ~ ^4 — ^2214)] 

1 

— P. k . ~ P-2- + ^[( £ 1111 ~ £ 122l) + ( £ 1122 ~~ £ 12n) + ( £ 2113 ~~ £ 222s) + ( £ 2124 — £ 22m)] 


Note that the horse effects h m subtract out of the expression. The variance of P depends on 
the variance of the £ tjkm and not the variance of the li m . The variance of Pis Var(P) = (tI/2. 
The best estimator of F can be expressed as 


F = Vi- ~ Vi- 

1 

= ^[(3/1111 + 3/1221) + (3/1122 + 3/1212) — (3/2113 + 3/2223) — (3/2124 + 3/2214)] 


This estimator is a between-horse comparison; that is, it is a comparison of the mean of 
horses 1 and 2 with the mean of horses 3 and 4, and thus it depends on the horse effects. 
The dependency on the horse effects is observed by expressing F in terms of the right-hand 
side of the model as 


F 


^ i(Pm + ^i 
— (p 211 + h 3 

[P- 1 - - P-2- + 
+ ^[h t + h 2 


"*■ ^llll Pl22 "*■ ^1 "*■ ^1221) "*■ (Pll2 "*■ ^2 "*■ ^1122 

^2113 P222 + h 3 + £2223) _ (P 212 “4 "*■ ^2124 

1 

t[( £ 1111 + £ 122l) + ( £ 1122 + £ 1212) — ( £ 2U3 + £ 2223 


ll 3 ll 4 ] 


Pl21 ^2 ^ 1212 ) 

+ P 221 + ll 4 + £ 2214 )] 

) — ( £ 2124 + £ 22 m)] 
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which involves the h m . The variance of F depends on the variance of the h m and the variances 
of the e ijkm or Var(F) = \ [cy + 2a£ orse ]. Similarly, it can be shown that P, T, F x T and F x P are 
within-horse effects while F, P x T and F x P x T are between-horse effects. Since the 
between-horse effects involve the h m , they are said to be confounded with horse effects; 
that is, F, P x T and F x P x T are confounded with horse effects. 

This design consists of a three-way treatment structure in an incomplete block design 
structure where each horse is a block. Since some of the comparisons are comparisons 
between horses and others are comparisons between feet "within" a horse, this experiment 
involves two sizes of experimental units. The feet are the smaller-size experimental units, 
while the horses are the larger-size experimental units. The h m term in the model repre¬ 
sents the horse error (that is, variation due to differences among horses treated alike), while 
the £ i;te term represents the foot error (that is, variation due to differences among feet 
treated alike on the same horse). When designing experiments involving different sizes of 
experimental units, it is desirable to choose those effects that are most important so that 
they involve comparisons between the smaller experimental units and to let those effects 
that are least important involve comparisons between larger experimental units. However, 
this type of arrangement is not always possible. For example, if the horse foot experiment 
involved two types of horses (say, racing and working), it would be impossible for types of 
horse to be other than a between-horse comparison. In the horse foot experiment, which 
has only three factors, the experimenter is most interested in comparing the two fusion 
techniques. The design given in Table 5.30 is such that the fusion effect is confounded with 
horses resulting in less precision for comparing the fusion technique means than is desired. 
The design given in Table 5.31 yields a comparison between the fusion technique means 
that is not confounded with horses and thus achieves the goal of having the most impor¬ 
tant effect being compared using the variability associated with the smaller experimental 
unit, the horse feet. 

The model for the first arrangement can also be used to represent data from this second 
assignment of treatment combinations to the horses or blocks. Using the same techniques 
as for the first arrangement, it can be shown that the F, P, T, and F x P x T effects are within- 
horse or intra-horse comparisons and that the FxT, FxP, PxT effects are inter-horse com¬ 
parisons and these effects are confounded with the horse effects. Neither of the above two 
designs yield enough observations to provide any degrees of freedom for estimating the 
two error terms. In order, to obtain some degrees of freedom for the two types of error 
variances, one could repeat the design in Table 5.31 using eight horses where two horses 
are randomly assigned to each set of treatment combinations. The analysis would still 
consist of two parts, a between-horse analysis and a within-horse or feet-within-horse 
analysis. There would be eight within-horse comparisons (one from each horse) that 
can be partitioned into estimates of the F, T, P, and F x P x T effects and an estimate of the 
within-horse error variance, denoted by error (feet). There are seven between-horse 


TABLE 5.31 


Second Assignment of Treatment Combinations for Horse Feet Experiment 


Horse 




1 

2 

3 

4 

U PiT t 

F 2 P 1 U 

F1P1T2 

f 2 pj 2 

F2P2T2 

D P2T2 

F2P2D 

PUT 
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TABLE 5.32 

Between-Horse and Within-Horse Analysis of Variance Table 


Source 

df 


Between-Horse 

7 


FxP 

1 


FxT 

1 1 

j 

PxT 

1 

1 Horse is experimental unit 

Error (Horse) 

4 j 

* 

Within-Horse 

8 


F 

1 


T 

1 ] 


P 

1 

| Foot of horse is experimental unit 

FxPxT 

1 


Error (foot) 

4 J 



comparisons which can he partitioned into estimates of the FxT, F xP, and PxT effects 
and an estimate of the between-horse error variance, denoted by error (horse). The result¬ 
ing analysis of variance table is displayed in Table 5.32. 

This designed experiment falls into the class of repeated measures designs since there 
are two measurements (repeated) on each horse; that is, a front foot and a rear foot are 
measured on each horse, and the levels of position cannot he randomly assigned to its 
experimental units. 

5.5.2 Example 5.7: Comfort Study—Repeated Measures Design 

An experimenter wants to study the effect of six environmental conditions on the comfort 
of people. He has six environmental chambers, and each can be set with a different 
environmental condition. The experiment consists of putting one person in a chamber and 
then measuring the person's comfort after 1, 2, and 3 hours of exposure. There are 36 sub¬ 
jects in the study where six subjects are randomly assigned to each environment. The 
researcher can obtain data from six environmental conditions during one day, thus days 
are used as a blocking factor where each environmental condition is assigned to one of the 
six chambers each day. Time of exposure is an important factor in the study, thus the treat¬ 
ment structure is a two-way with the levels of environment crossed with the three expo¬ 
sure times. The subjects were randomly assigned a number from 1 to 36, where the first six 
persons were involved in the experiment during the first day, etc. An assignment of envi¬ 
ronments and persons to chambers is displayed in Figure 5.23. Each rectangle in Figure 
5.23 represents a person and 7j, T 2 , and T 3 represent the measures of comfort after 1,2, and 
3 hours of exposure, respectively. The experimental unit for environments is a person or 
chamber and the experimental unit for time is a 1 h interval "within" that person. In effect, 
the person is "split" into three parts, but this design structure is called a repeated measures 
design rather than a split-plot design since the levels of time of exposure cannot be ran¬ 
domly assigned to the three 1 h exposure times within a person. The structure of this design 
is identical to the usual split-plot with a two-way treatment structure in a randomized 
complete block whole-plot design structure. For this repeated measures design structure 
the larger-sized experimental unit or person or whole plot design consists of a one-way 
treatment structure (six levels of environment) in a randomized complete block design 
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Environment 


Day 1 
Day 2 
Day 3 

Day 4 

Day 5 
Day 6 


1 2 3 4 5 6 


(1) T 1 T 2 T 3 

(2) T 1 T 2 T 3 

(3) T{T 2 T 3 

(4) T 3 T 2 T 3 

(5) TJ 2 T 3 

(6) TJ 2 T 3 


(7) T 1 T 2 T 3 

(8) T{T 2 T 3 

(9) T{T 2 T 3 

(10) T{T 2 T 3 

(11) T{T 2 T 3 

(12) T{T 2 T 3 


(13) T{T 2 T 3 

(14) T{T 2 T 3 

(15) T{T 2 T 3 

(16) T{T 2 T 3 

(17) T{T 2 T 3 

(18) T{T 2 T 3 


(19) T k T 2 T 3 

(20) T{T 2 T 3 

(21) T 1 T 2 T 3 

(22) T{T 2 T 3 

(23) T{T 2 T 3 

(24) T{T 2 T 3 


(25) T 3 T 2 T 3 

(26) T{T 2 T 3 

(27) T{T 2 T 3 

(28) T{T 2 T 3 

(29) T{T 2 T 3 

(30) T 1 T 2 T 3 


(31) T 3 T 2 T 3 

(32) T{T 2 T 3 

(33) T{T 2 T 3 

(34) T 3 T 2 T 3 

(35) T{T 2 T 3 

(36) T 3 T 2 T 3 


FIGURE 5.23 Data arrangement for the comfort study. 


structure (six days). The smaller-sized experimental unit or lh time interval or split-plot 
design is a one-way treatment structure (three times of exposure) in a randomized com¬ 
plete block design structure where each person or chamber is a block. A model that can be 
used to describe data from this experiment is 

Vijk = H + Ei + dj + c {j + T k + (ET) ik + e ip i = 1,2,... ,6, /' = 1,2,...,6, fc = 1,2,3 

where dj denotes the j th day effect, c, ( denotes the effect of the chamber or person used for 
the zth environment during the j th day and e ijk is the time interval effect. If there is one 
observation made on each person, the resulting data set would be from a one-way treat¬ 
ment structure in a randomized complete block design structure. The chamber or person 
error term is computed from the day by environment interaction. If one compares the 
times at the first environmental condition, the resulting design is a one-way treatment 
structure in a randomized complete block design structure where the person or chamber 
or day is the blocking factor. The time interval error term is computed from the time 
by day interaction. An error term for time intervals can be computed for each of the envi¬ 
ronments and can be pooled into the time interval error, if the variances are equal. The 
analysis of variance table for this repeated measures design is given in Table 5.33. This 
basic repeated measures structure is identical to that of the usual split-plot design. 
The assumptions made for the split-plot model are that the dj, e, ( , and e ijk are independent 
random variables with variances a 1 , , a 2 , , , and a;. The assumptions for the repeated 

measures design are that d,, c, jr and e ijk are independent random variables, but since the £ ijk 
are measured on the same person, they can be correlated and that correlation structure 
may need to be modeled. The modeling of the covariance structure is discussed in 
Chapters 26 and 27. 

An alternative to assigning one person to each environmental chamber is to assign a 
group of six people to a single environmental chamber and complete the study in one day. 
This assignment process seems like a good idea, but the resulting data set does not provide 
any measure as to how chambers vary when treated alike. The nonexistence of the chamber 
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TABLE 5.33 


Analysis of Variance Table for the Comfort Study 


Source 

df 

EMS 


Days 

5 

+ 3°dia m be r 

+ 18o3* 

Person Analysis 

Environment 

5 

O? + 3<4amber 

+ f(E) 

Error (person) 

25 

ff E 2 + 3<4 amber 


Hour-Interval Analysis 

Time 

2 

<r}+f(T) 


Environment x time 

10 

oi+<P(ET) 


Error (hour-interval) 

60 




error term is because the group of six people forms the experimental unit for environments, 
and thus there is only one independent observation for each environment. Without an error 
term for chambers, there is no way to assess the effects of environment on comfort. 


5.5.3 Example 5.8: Crossover or Change-Over Designs 

A useful method for comparing treatments to be administered to subjects (or objects) is to 
allow the subject to serve as its own control by applying two or more of the treatments to 
the same subject. If there are two treatments, the process is to assign treatment A to be 
applied in the first time period and measure the response, allow the effect of treatment A 
to diminish or wash out, and then apply treatment B to the subject in the second time 
period and observe the response to treatment B. The randomization process is to construct 
sequences of the treatments (A followed by B and B followed by A, here) and then randomly 
assign subjects or animals to the sequences. This approach can also be used on plants, 
plots of land, or other objects where after subjecting an experimental unit to a treatment, 
the experimental unit can at least partially recover from the effects of the treatment given 
in the first time period. 

In this method of comparing two treatments, there are two sequences of treatment 
assignments for an animal —A followed by B and B followed by A. The two sequences are 
often denoted by the AB sequence and the BA sequence. The treatment structure is a one¬ 
way set of treatments with two levels (A and B), but since the treatments are applied in 
sequence to each experimental unit, the generated sequence becomes another type of treat¬ 
ment. Thus the design of this experiment involves a two-way treatment structure with 
treatments crossed with sequences. 

This crossover design is a repeated measures design with two sizes of experimental 
units. The treatment sequence is assigned to a subject so that the subject is the larger 
sized experimental unit. The time periods or times during which the treatments are 
observed are the smaller-sized experimental unit. The design for the large-sized experi¬ 
mental units is a one-way treatment structure with two levels of sequence (the two possible 
sequences) in a completely randomized design structure. Although any design structure 
can be used, a completely randomized design is used most often. The design for the small¬ 
sized experimental units (time interval) is a one-way treatment structure {two time periods 
or two treatments} in a randomized complete block design structure where the subjects are 
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TABLE 5.34 


Data Arrangement for a Two-Period Two-Treatment Crossover Design 




Animal 


1 

2 

", 

Sequence 1 




A (time 1) 

VllA 

Vl2A 

Vl mA 

B (time 2) 

VllB 

V\2 B 

VlniB 

Sequence 2 



n 2 

B (time 1) 

Vl\B 

VllB 

y2n 2 B 

A (time 2) 

VllA 

VllA 

Vl n 2 A 


the blocks. The data can be arranged as shown in Table 5.34 and a model that can be used 
to describe the data is 

3 Jijk = Hik + Sij + where (i,j,k) e l D , s tj ~ i.i.d. N( 0, (T^ ubject ), and e ijk ~ i.i.d. N( 0, of ime ) 

where I D is an index set with the collection of ( i , j, k) triples actually used in the experiment, 
p ik denotes the effect of the /cth treatment in the zth sequence, s, ( denotes the random effect 
of the /th subject assigned to the zth sequence, and e ijk denotes the random error of a mea¬ 
surement in the time period of the zth sequence to which the kth treatment was applied. 
The analyses of some crossover designs are presented in Chapter 29. 


5.6 Designs Involving Nested Factors 

In a given design, it is possible to have nested effects in the design structure, in the treat¬ 
ment structure, or in both structures. Nesting occurs most often in the design structure of 
an experiment where a smaller-sized experimental unit is nested within a larger sized one. 
One size of experimental unit is nested within a larger size if the smaller experimental 
units are different for each large experimental unit. When the design structure consists of 
several different sizes of experimental units and there is an ordering where the smallest 
size is nested within the next smallest size, on up to the next to largest is nested within the 
largest size experimental units, the design is also called a hierarchical design structure. 
The split-plot and repeated measures designs discussed in the previous sections of this 
chapter are good examples of design structures where the smaller-sized experimental 
units are nested within the larger-sized experimental units. Nesting occurs in the treat¬ 
ment structure when the levels of one factor occur with only one level of a second factor. 
In that case, the levels of the first factor are said to be nested within the levels of the second 
factor. Table 5.35 illustrates nesting in the design structure for the second part of Example 
4.2 where houses are nested within squares (an "X" indicates that a house is included in 
the indicated square). Each square is the large experimental unit or block of houses, and 
the house is the smaller size of experimental unit. Since the houses for the first square are 
different from the houses for the second square, houses are nested within squares. Such a 
nested effect is often expressed in a model by s k + h m(k) , where s k denotes the effect of the kth 
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TABLE 5.35 


Design Showing Houses Nested within Squares 


Houses 

Square 1 

2 

3 

4 5 

6 

7 8 

1 X 

1 

X 

X 

X 

X 

X 

X X 


square and h m(k) denotes the effect of the mth house nested in the /cth square. The sums of 
squares for squares and houses within squares are denoted by SSSQUARE and 
SSHOUSES(SQUARES), respectively. If houses and not squares are included in the model, 
then there is only a single sum of squares due to houses (SSHOUSES) which can be 
partitioned as SSHOUSES = SSSQUARES + SSHOUSES(SQUARES). This process can be 
carried on for one more step where the sides of the houses are nested within houses that 
are nested within squares. Thus these three sizes of experimental units form a hierarchical 
design structure. 

5.6.1 Example 5.9: Animal Genetics 

An animal scientist wants to study the effect of genetics on the growth rate of lambs. She 
has four males (sires) and 12 females (dams). The breeding structure is shown in Table 5.36 
(an "X" denotes a mating). For this example, each sire is mated to three dams where the 
three dams are different for each sire. Thus, the levels of dam are called a nested factor 
where dams are nested within sires. If the animal scientist is interested the effect of this 
set of sires with this set of dams, then the levels of dams nested within the levels of sires 
forms a nested treatment structure. 

When nesting occurs in the treatment structure, the treatment structure must consist 
of at least two factors. In this case, each level of the nested factor occurs just once with a 
level or levels of the other factor. The next two examples demonstrate nesting in a treat¬ 
ment structure. 


5.6.2 Example 5.10: Soybeans in Maturity Groups 

The agricultural crop of soybeans provides a great example of nesting in the treatment 
structure. Varieties of soybeans are classified into maturity groups. In the Midwest region 


TABLE 5.36 


Breeding Structure Showing Dams Nested within Sires 


Sires 






Dams 






1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

1 

X 

X 

X 










2 




X 

X 

X 







3 







X 

X 

X 




4 










X 

X 

X 
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TABLE 5.37 


Treatment Structure of the Soybean Study with Varieties Nested within Maturity Groups 


Maturity Group 




Varieties 




1 

2 

3 

4 

5 

6 

7 

8 

4 

X 

X 







5 



X 

X 

X 

X 



6 







X 

X 


of the United States, varieties of soybeans in maturity groups 4, 5, and 6 are grown, but 
varieties from one maturity group may be better for a particular region than a variety of 
another maturity group. A study was designed to evaluate eight soybean varieties where 
two were from maturity group 4, four were from maturity 5 and two were from maturity 
group 6. Table 5.37 is a display of the treatment structure, indicating the levels of varieties 
are nested within the levels of maturity. A model that can be used to describe data from 
this nested treatment structure in a randomized complete block design structure with four 
blocks is 


y ijk = p + M,: + V(M) m + b k + e ijh i = 1,2,3, /' = 1,..., n ir k= 1,2,3,4 
b k ~ i.i.d. N( 0, (7r OW ), e ijk ~ i.i.d. N( 0, of), n k = n 3 = 2, and n 2 = 4. 

where M, denotes the effect of the ith maturity group, denotes the effect of the j th 

variety nested within the ith maturity group, b k denotes the effect of the kth block and £ jjk 
denotes the experimental unit error. The analysis of variance table for the above model is 
given in Table 5.38 where the error term is computed as the variety by block interaction. 
The variety by block interaction can be partitioned as the maturity group by block inter¬ 
action plus the variety nested within maturity group by block interaction. 

One problem with the above design is that the plots need to be harvested at different 
times as maturity group 4 beans are ready for harvest before maturity group 5 beans 
which are ready to be harvested before maturity group 6 beans. An alternative to the ran¬ 
domized complete block design is to use a split-plot design where the whole plots are 
formed by the maturity groups and the subplots are formed by the varieties within a 
maturity group. The advantage of the split-plot design is that all varieties within a whole- 
plot could be harvested at the same time. The randomization process is to randomly assign 
the maturity group levels to sets of plots within each whole plot and then randomly assign 


TABLE 5.38 


Analysis of Variance Table for Soybean Varieties Nested within 
Maturity Groups in a Randomized Complete Block Design Structure 


Source 

df 


Blocks 

3 

<7? + S^Hock 

Maturity groups 

2 

C7 e 2 + <p(M) 

Varieties (maturity groups) 

5 

cl + f [V(M)] 

Error 

21 

<V 2 
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the respective varieties to the plots within a maturity group. A model for this split-plot 
design structure is 

Dijk = p + M, + b k + w ik + V(M) m + e ip i = 1,2,3, j = l,...,n ir k = 1,2,3 ,4 
K ~ i.i.d. N( 0, <j 2 ow ), w lk ~ i.i.d. N{ 0, a 2 wplot ), and e ijk ~ i.i.d. N( 0, a 2 e ) 

where w ik denotes the whole-plot effect and e ijk denotes the subplot effect. The whole plot 
design is a one-way treatment structure (three levels of maturity) in a randomized com¬ 
plete block design structure where the whole-plot error is computed from the block by 
maturity interaction. The subplot design consists of three separate one-way treatment 
structures in randomized complete block design structures, one for each maturity group. 
The whole-plots are of different size since there are different numbers of varieties within 
each maturity group. The analysis of variance table for this two-way nested treatment 
structure in a split-plot design structure is in Table 5.39. 


5.6.3 Example 5.11: Engines on Aircraft 

An aircraft company wants to evaluate the performance of seven engine types with three 
aircraft types. Because of certain mechanical characteristics, only certain engines can be 
used with each type of aircraft. The possible engine-aircraft configurations (marked by 
an "X") are displayed in Table 5.40. 

As shown in Table 5.40, each engine type can occur with one and only one aircraft type, 
thus the levels of engines are nested within the levels of aircraft. The aircraft company made 
three aircraft for each of the seven treatment combinations (engine types by aircraft types). 
The data collection process is to randomly order the 21 engine type-aircraft type configura¬ 
tions, have a test pilot fly the planes in the random order, and measure a performance 
characteristic. Let xj ijk denote a performance measure of the /cth airplane made from the /th 
aircraft type with the /'th engine type. Models that can be used to describe the performance 
measures expressed as a means model and as an effects model are 

Vijk = Pij + Vijk, f or ( i,j ) e 0 and p ijk ~ i.i.d. N( 0, a 2 plme ) 
or 

Vijk = P + A, + E m + p ljk , for (/,/') e 0, 


TABLE 5.39 

Analysis of Variance Table for Soybean Varieties Nested within 
Maturity Groups in a Split-Plot Design Structure 


Source 

4/ 


Blocks 

3 

ct/ + 2.5cJj + 7.2<jj( lock 

Maturity groups 

2 

cr? + 2.5cJ?„ p + f(M) 

Error (whole-plot) 

6 

ol + 2.5 ol p 

Varieties (maturity groups) 

5 

a\ + f[V(M) ] 

Error (subplot) 

15 
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TABLE 5.40 


Observable Engine-Aircraft Configurations of the Nested Treatment Structure 


Aircraft Type 



Engine Type 



A 

B 

C D E 

F 

G 

1 

X 

X 

X 



2 



X X 



3 




X 

X 


where 


0 = ((1 ,A), (1, B), (1,0, (2,D), (2,E), (3,F), (3,G)} 

An analysis of variance table for the effects model is displayed in Table 5.41. The plane 
error term is computed from the variation of the performance scores of the three planes 
made with the same engine type and aircraft type configuration pooled across the seven 
configurations. In terms of the means model, the aircraft sum of squares tests the equality 
of the aircraft means or tests the null hypothesis 


Mi- = M2. = Ms- 


or 


Mia + Mib + Mic _ M2D + M2E _ M3F + the 
3 2 2 

The engine nested within aircraft sum of squares tests the null hypothesis 

Mi a = Mib = Mio M2D = M2E/ an d Ms; = M3 g 

One has to be quite careful when nesting occurs in the treatment structure because there 
is a tendency to carry out the analysis as if there is a nested or hierarchical design structure. 
The individual airplanes are the only experimental units for this study, that is, there is just 
one size of experimental unit and only one error term. When the nesting occurs in the 
design structure, there is more than one size of experimental unit and thus more than one 
error term in the model. The next example illustrates nesting in the design structure. 


TABLE 5.41 

Analysis of Variance Table for Two-Way Nested Treatment Structure 
in a Completely Randomized Design Structure 


Source 

df 


EMS 

Aircraft type 

2 


,„e+0 2 (d) 

Engine type (aircraft type) 

4 


,„e+ mm 

Error (plane) 

14 

cy2 

plane 
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5.6.4 Example 5.12: Simple Comfort Experiment 

A comfort experiment was conducted to study the effects of temperature (three levels, 18, 
21, and 24°C) and sex of person [two levels, male (M) and female (F)] in a two-way treat¬ 
ment structure on a person's comfort. There are several methods to measure a person's 
comfort, and so for the discussion here, assume that there is one comfort measurement 
made on each person. The three temperatures were each randomly assigned to three of the 
nine available environmental chambers. A chamber is the experimental unit for the levels 
of temperature and the chamber design is a one-way treatment structure in a completely 
randomized design structure. 

Eighteen males and 18 females were randomly assigned to chambers so that two males 
and two females were assigned to each of the nine chambers. The experimental unit for 
sex of person is a person, and the person design is a one-way treatment structure in a ran¬ 
domized complete block design structure where the chambers are the blocks. There are 
two replications on each level of sex within each block. The assignment of chambers to 
temperatures and of persons to chambers is displayed in Figure 5.24. 

After the people were subjected to the environmental condition for 3h, their comfort was 
measured. A means model and an effects model that can be used to describe these data are 


for 


Uijkm Pik Cj(i) Pm(ijk) ^ 1/2,3, / 1,2,3, l( 1,2, 1'il 1,2 

Cj(>) ~ i-i-d- MO, <r hamber ) and P*m ~ LiA N <°' ^person) 


or 


Vijkm - H + Ti + c j{l) + S k + ( TS) ik + 
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Numbers within boxes denote person numbers of a given sex 


FIGURE 5.24 Assignments of persons and temperatures to chambers for simple comfort experiment. 
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TABLE 5.42 

Analysis of Variance Table for the Simple Comfort Experiment 
with a Nested Design Structure 


Source 

df 

EMS 

Temperature 

2 

person + 40^ amber + f(J) 

Error (chamber) 

6 

/t 2 i A/j2 

^ person chamber 

Sex 

1 

Oferson + <f(S) 

Sex x temperature 

2 

^person + <P 2 (ST) 

Error (person) 

24 

<7 2 

w person 


where p, k denotes the mean of the ith temperature and /cth sex, c ;(i) is the random effect of 
the /th chamber assigned to the ith temperature, p,„ (i/7t) denotes the random effect of the wzth 
person of the kth sex assigned to the /th chamber assigned to the ith temperature, and T„ 
S kl and ( TS) ik denote the effects of temperature, sex, and temperature by sex interaction, 
respectively. 

There are two levels of nesting involved in this experiment. First, environmental cham¬ 
bers are nested within temperatures. Second, persons of the same sex are nested within 
chambers. The analysis of variance table for this experiment is shown in Table 5.42. The 
effects of sex and the sex x temperature interaction are between-person comparisons 
and, thus the person error term is used for making comparison among those means. The 
chamber error term is used to make comparisons between temperature means. 


5.6.5 Example 5.13: Multilocation Study with Repeated Measures 

A typical agricultural study involves evaluating treatments at several locations and often 
the response is measured over time. This study is to evaluate four varieties of alfalfa 
using three locations. Alfalfa is a crop that is harvested three or more times during the 
growing season, so the data set contains repeated measures. At each location, the design 
is a one-way treatment structure in a randomized complete block design structure 
with three replications. The assignment of varieties to plots at each location is displayed in 
Figure 5.25. Figure 5.26 is a graphic of the experimental units at each of the locations. 
Since all of the plots at a location are cut or harvested at the same time, all of the plots 
at a location are subjected to the same environmental conditions that happen to occur 
between cuttings. Thus, it is the whole experiment at a given location that is being 
repeatedly measured and not the individual plots. A column in Figure 5.26 represents the 
entity to which a variety is applied and harvested three times and is the experimental 
unit for varieties. A layer of plots represents the entity which is subjected to the environ¬ 
mental conditions and is the experimental unit for the levels of cutting. The levels of cut¬ 
ting form a strip-plot structure across all of the plots at a given location. The cell denotes 
the unit of a variety being harvested at one cutting time. 

The analysis for the columns can be constructed by ignoring the layers, which provides 
a one-way treatment structure in randomized complete block design structure at each 
location. Thus the column error is computed by the variety by block interaction pooled 
across locations. The analysis for the layers can be constructed by ignoring the columns, 
which provides a one-way treatment structure (levels of cutting) in a randomized com¬ 
plete block design structure where locations are the blocks. The layer error term is com¬ 
puted as the cutting by location interaction. Finally, the cell is the unit for the cutting by 
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Location 1 


Rep 1(1) Rep 2(1) Rep 3(1) 



Location 2 


Rep 1(2) Rep 2(2) Rep 3(2) 



Location 3 


Rep 1(3) Rep 2(3) Rep 3(3) 



FIGURE 5.25 Assignment of alfalfa varieties to plots for multi-location experiment. 


Column is experimental Cell is experimental unit 

unit for variety fo r cutting by variety 



variety interaction and the cell error term is computed as the cutting by variety by block 
interaction pooled across locations plus the cutting by block interaction nested within 
locations. A model that can be used to describe a response from each cell is 


y ijkm -p+h+ b ft ) + v k +(iv), ik + (bv) m + c, n + ( ic) im + iyc) km + (lVC) ikm + e ijkm 
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TABLE 5.43 


Analysis of Variance Table for the Multilocation Study with Repeated Measures 


Source 

df 

EMS 

Locations 

2 

°cell + ^°^Blk( Loc) + ^^V*Bllc( Loc) + 3O’v*v*L 0C + Loc 

+ 9^00 + 360-^ 

Blocks (location) 

6 

°"cell + 3 °|//c(Loc) + 12<7j>* B/;c (Loc) 

Varieties 

3 

^cell 3 o|//c(Loc) + 3<7 v*C*Loc + 9°V'*Loc + V ) 

Locations x varieties 

6 

°cell + 3of/k(Loc) + 3<Tj* c * Loc + 9<7y* Loc 

Varieties x blocks (varieties) = error (column) 

18 

°"cell + 3 <7 b//c(Loc) 

Cuttings 

2 

^cell + 3<7y*c*Loc + 13 (J^Loc + ^(Q 

Location x cuttings = error (layer) 

4 

^cell + 3(?v*C*Loc + 13(Tc*L OC 

Varieties x cuttings 

6 

°cdl + 3C7y»c»Loc + f( VC ) 

Location x varieties x cuttings 

12 

°"cell + 3 Ov*C*Loc 

Error (cell) 

48 

<4ll 


where 


h ~N(0,0, b m ~N(0, <4 (Loc) ), (ZV) tt ~N(0, ff^), (bV) m ~N( 0, <r^ ft(Loc) ), 
(lC) im ~i.i.d.N( 0,<7c* L oc )/ (lVC) lkm ~ i.i.d.N(0, (T 2 v , c , ]oc ), £ ijkm ~ i.i.d.N( 0,< u ) 

The model assumptions are expressed where locations and blocks nested within loca¬ 
tions are considered as random effects (see Chapter 18). As a consequence, all interactions 
involving location or blocks nested within location are also random effects. The levels of 
cutting at each location are the repeated measurements as cutting 1 is first, followed by 
cutting 2 which is followed by cutting 3. The levels of cutting cannot be randomly assigned 
to the layers within a location, thus providing the repeated measurements. The assump¬ 
tions in this model are that the layers are independent of each other with equal variances, 
but because of the repeated measurements a more complex covariance structure may be 
more appropriate. The modeling of the covariance structure for repeated measures designs 
is discussed in Chapters 26 and 27. The analysis of variance table for this model is displayed 
in Table 5.43 where the expected mean squares are obtained from the model assumptions 
about the random effects (see Chapter 18). 


5.7 Concluding Remarks 

In this chapter, design structures involving several different sizes of experimental units 
were considered. These designs are also called multilevel designs and some multilevel 
designs are called hierarchical designs. Design types discussed included split-plot designs, 
strip-plot designs, nested designs, repeated measures designs, and several variations and 
combinations. The designs discussed in this chapter can be combined into some very com¬ 
plex designs. In such cases, the key to determining an appropriate model needed to describe 
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the data is to identify the sizes of the experimental units and to identify the treatment 
structure and design structure for each size of experimental unit. Almost always, a design 
structure for a given size of experimental unit will be one of the four basic design struc¬ 
tures and the error sum of squares will be computed using the process for the specific 
basic design structure. The emphasis in this chapter was on recognizing such designs and 
on when and how to use them. Analyses of designs involving more than one size of exper¬ 
imental unit are presented in Chapters 24-30, where the assumptions concerning the mod¬ 
els are discussed in detail. For further discussion and examples of multilevel designs, refer 
to Milliken (2003a, b) and Milliken et al. (1998). 


5.8 Exercises 

5.1 Find two published research papers in the literature which use a two-way or 
higher-order treatment structure within a multilevel design which has two or 
more different sizes of experimental units. For each paper, describe in detail the 
treatment structure, the design structure and the different sizes of experimental 
units used in the experiment. Comment as to the appropriateness of the design 
and its analysis [at least as far as the information provided by the author(s) is 
concerned], 

5.2 The effects of four chemical treatments and two irrigation levels on the growth 
of two cultivars of wheat were studied using eight growth chambers. Each 
growth chamber was assigned a level of irrigation and a cultivar. There were 
eight pots in each growth chamber, and the two levels of chemical were randomly 
assigned to the pots, denoted by Chem 1 or Chem 2, for a total of four pots for 
each chemical within an irrigation x cultivar combination. The randomization 
scheme is displayed in the following table. 


Chamber Treatments Pots in Chambers 


Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

1 

1 

1 

1 

2 

1 

2 

2 

1 

2 

1 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

2 

2 

1 

2 

2 

1 

1 

2 

2 

1 

1 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

3 

1 

1 

2 

2 

1 

1 

2 

1 

1 

2 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

4 

1 

2 

2 

2 

1 

2 

2 

1 

1 

1 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

5 

2 

1 

1 

2 

1 

1 

2 

2 

2 

1 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

6 

2 

2 

1 

2 

1 

2 

1 

2 

1 

2 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

7 

1 

2 

1 

1 

2 

2 

2 

2 

1 

1 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

8 

2 

2 

1 

2 

2 

1 

2 

1 

1 

2 
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1) Write out an analysis of variance table for this experiment. 

2) Identify each size of experimental unit and the corresponding design struc¬ 
ture and treatment structure. 

3) Write an effects model to describe data from this experiment. 

5.3 The effects of four chemical treatments and two irrigation levels on the growth 
of two cultivars of wheat were studied using four growth chambers. There were 
eight pots in each growth chamber, denoted by Chem 1 or Chem 2, for a total of 
four pots for each chem x irr x cult combination as displayed below, that is, the 
table shows the randomization scheme. 


Chamber Treatments Pots in Chambers 


Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

1 

1 

1 

1 

2 

1 

2 

1 

2 

1 

2 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

3 

2 

1 

1 

2 

1 

2 

1 

2 

1 

2 

Chamber 

Irrigation 

Cultivar 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

Chem 

4 

2 

2 

1 

2 

1 

2 

1 

2 

1 

2 


1) Write out an analysis of variance table for this experiment. Identify the 
different experimental units and their corresponding design structures and 
treatment structures. 

2) What, if anything, do you see wrong with this design? 

3) If you were asked to help develop a design structure for this experiment, 
how would you advise the experimenter to carry out an experiment. Write 
out the analysis of variance table for your design. 

4) Provide an alternative design to the one in part 3 and write out the analysis 
of variance table for this experimental design. 

5) Discuss what you believe to be some of the advantages and disadvantages of 
the two designs in parts 3 and 4. 

5.4 An experiment was to be conducted to study the effects of three different melt 
mixing procedures (Ml, M2, and M3), and nine coating lay-downs (Cl, C2,..., C9), 
on the quality of photographic film. One of the steps in film-making requires that 
the film be placed in baskets while being processed in a standard way. This stan¬ 
dard process was not being changed in any way. The baskets are large enough to 
hold 18 strips of film. The nine coating lay-downs were randomly assigned to two 
of the 18 film strips within each basket. The melt mixing procedures were ran¬ 
domly assigned to the baskets. The three baskets formed one complete replication 
of the experiment. This whole process was repeated three more times on different 
days, providing four complete replicates of the treatment combinations. 

1) What are the different sizes of experimental units? 

2) What is the treatment structure for this experiment? 

3) What is the design structure for this experiment? 
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4) What is the design structure and treatment structure for each size of experi¬ 
mental unit? 

5) Write out the analysis of variance table for this experiment. 

5.5 Suppose the experiment in Exercise 5.4 is conducted in the following manner: For 
a basket selected at random, one of the nine coating lay-downs and two of the 
three melt mixing procedures are assigned at random to the 18 film strips in the 
basket. For example, one basket might contain the combinations, M1C1, M1C2,..., 
M1C9, M2C1, M2C2,..., M2C9. A second basket gets a different pair of the melt 
mixing procedures combined with all nine coating lay-downs. Thus the second 
basket may contain the combinations, M1C1, M1C2,..., M1C9, M3C1, M3C2,..., 
M3C9. The third basket gets the remaining pair of melt mixing procedures and 
all nine coating lay-downs, so the third basket would contain the combinations, 
M2C1, M2C2,..., M2C9, M3C1, M3C2,..., M3C9. This whole process would be 
repeated three more times on different days, providing four complete replicates. 

1) What are the different sizes of experimental units? 

2) What is the treatment structure for this experiment? 

3) What is the design structure for this experiment? 

4) What is the design structure and treatment structure for each size of experi¬ 
mental unit? 

5) Write out the analysis of variance table for this experiment. 

5.6 Discuss what you believe to be the advantages and disadvantages of the two 
designs used in Exercises 5.4 and 5.5. Which design would you recommend and 
why? 

5.7 A baker wants to evaluate the effect of baking temperature on different formula¬ 
tions of bread. During one time period, she has two ovens, each of which can 
bake three loaves of bread at one time. The baker wants to evaluate three baking 
temperatures and four bread formulations. On a given day, she mixes three 
batches of bread dough using three of the formulations and forms two loaves 
from each batch. Next, one loaf from each batch is places into the two ovens and 
baked at one of the three temperatures. The following table displays the formu¬ 
lations and temperatures used each of 12 days: 


Temperatures and Formulations Used Each Day of the 
Experiment for Exercise 5.7 


Day 

Baking 

Temperatures (°C) 

Bread Formulations 

1 

160 

190 

A 

B 

D 

2 

160 

190 

A 

C 

D 

3 

160 

190 

A 

B 

C 

4 

160 

175 

A 

C 

D 

5 

160 

175 

B 

C 

D 

6 

175 

190 

A 

B 

D 

7 

175 

190 

A 

B 

C 

8 

175 

190 

B 

C 

D 

9 

160 

175 

A 

B 

D 

10 

160 

190 

B 

C 

D 

11 

175 

190 

A 

C 

D 

12 

160 

175 

A 

B 

C 
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1) Identify the different sizes of experimental units (draw a diagram). 

2) What are the design and treatment structures for each of the experimental 
units you identified in part 1? 

3) Write out an analysis of variance table for each size of experimental unit. 

4) Write out a model to describe data from this experiment and write out the 
analysis of variance table. 




Matrix Form of the Model 


Summation notation becomes very laborious and sometimes nearly impossible to use 
when one works with unbalanced fixed effects models, random effects models, or mixed 
effects models. This problem can be solved by using a matrix form representation of the 
model. This chapter contains a discussion of the construction of the matrix form of the 
model and a description of how to use the matrices to obtain least squares estimators, to 
test hypotheses, to compute least squares or population marginal means, and to construct 
confidence intervals. The concept of estimability is discussed in Section 6.3. 


6.1 Basic Notation 


The matrix form of a model can be expressed as 

y = X p + e 

nxl nxp px 1 n x 1 


( 6 . 1 ) 


where y denotes an n x 1 vector of observations, X denotes an nxp matrix of known con¬ 
stants, called the design matrix, /? denotes a p x 1 vector of unknown parameters, and e 
denotes an n x 1 vector of unobserved errors. The model for zth observation (the zth element 
of y) is of the form 


y, = A>+A*,i + A*,2 + ■ • • ■+ A-i V 


+ £,, i = 1,2,...,n 


( 6 . 2 ) 
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The vectors and matrices used to represent model (6.2) as a matrix model (6.1) are 



>il 


'i 

x n 

x 12 

'• *lp-ll 


Po 

Pi 



V 

y = 

y 2 

, x = 

i 

X 21 

x 22 

X 2p-1 

, P = 

Pi 

, and 

£ = 

e 2 


}Jn_ 


i 

Xnl 

X„2 ■ 

^rip-l 


R 



_ £ n_ 









Pp -1 





(6.3) 


Matrices of the type in Equation 6.3 can be used to represent many model types, including 
such design models as one-way models, two-way models, factorial models, and fractional 
factorial models as well as regression models, analysis of covariance models, random 
effects models, mixed effects models, split-plot models, repeated measures models, and 
random coefficient regression models by specifying the appropriate elements for X and the 
appropriate assumptions on /J and e. The following sections present some matrix models 
for various experimental situations. 


6.1.1 Simple Linear Regression Model 

The simple linear regression model can be expressed as, y, = /3 (l + /l,x, + £„ i = 1,2,..., n and 
can be represented in matrix form as 


~Vi 


'1 

X x 



~ £ i~ 

Vi 

= 

1 

x 2 

A," 

Pi\ 

+ 

£ 2 

Mn_ 


1 

X 

n J 



_ £ «_ 


The column of Is in X corresponds to the intercept of the regression model p o and the 
column of x, corresponds to the slope of the regression model. 


6.1.2 One-Way Treatment Structure Model 

To represent the model for a one-way treatment structure with t treatments in a completely 
randomized design structure with n, observations for the zth treatment, let the indepen¬ 
dent variables x, ( be defined by 


X kij = 


0 

1 


if the i/th observation is not from the /cth treatment 
if the ij th observation is from the /cth treatment 


for i = 1, 2,..., t and j = 1, 2,..., n r The variable x kjj is called an indicator variable in that, 
when it takes on the value of 1, it indicates that the observation is from treatment k. When 
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the value of x kij is equal to 0, it indicates that the observation is not from treatment k. The 
means model can be expressed as 

y tj = liiX U j + /J. 2 x 2i j + —t /J. t x ti j + £ rj for i = 1, 2, . .. , t and 
j -1 , 2 , ..., n ir or in matrix notation 


Vn 1 0 ••• 0 

V\i 1 0 ••• 0 

y ln , i o ••• o 

3/21 0 1 ■" 0 

3/22 _ 0 1 ■••0 

y 2 „ 2 0 1 0 

Vn 0 0 1 

y,n,_ 0 1 

The means model is generally expressed as i/ 1( = jj, + e tj for i = 1, 2 , ..., t and j=l, 2 ,...,n i . 
The effects model can be expressed as 

1 j t j = ld+ + r 2 x 2l j + — 1 - T t x tjj + £jj for i = 1, 2,..., t and 

7 = 1 , 2 ,.or in matrix notation 



yn 110 

Vn 110 

yx„, 1 1 0 

y 2 i 10 1 

y 22 _ 1 0 1 

y2 „ 2 1 0 1 

x Jn 10 0 

Vm, _ _1 0 0 



The effects model is generally expressed as y, ; = y + T, + £,, for i = 1,2 ,... ,t and 7 = 
1,2,..., n t . The difference between the means model and the effects model is that the design 
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matrix for the effects model contains a column of Is for the intercept of the model p, while 
the means model does not contain that column of Is. 


6.1.3 Two-Way Treatment Structure Model 

A form of the model used for a two-way treatment structure in a completely randomized 
design structure with t row treatments and b column treatments is 

y ijk = Pij + £ ljk i = 1, 2,...,f, j = 1, 2,..., b, and k = 1, 2,..., n tj (6.4) 

The model used in Equation 6.4, called the means model, can be represented in matrix 
form as 


’ Vin ' 


'1 

0 

... 0 

0 

... o' 

Vll2 


1 

0 

... 0 

0 

... 0 

Vllna 


1 

0 

... 0 

0 

... 0 

y 121 


0 

1 

... 0 

0 

... 0 

y i 2 » 12 


0 

1 

... 0 

0 

... 0 

Mm 

— 

0 

0 

... 1 

0 

... 0 

y ibni b 


0 

0 

... 1 

0 

... 0 

3/211 


0 

0 

... 0 

1 

... 0 

y 21n 21 


0 

0 

... 0 

1 

... 0 

y fM 


0 

0 

... 0 

0 

... 1 

y tbn tb 


0 

0 

... 0 

0 

... 1 


+ 


Mu 

Ml2 

Mu, 
M 21 

Mu, 


+ £ 


The two-way effects model can be expressed as 


y ijk = p + t t + Pi + Yy + £ ijk i = 1,2,..., t, j = l,2,...,b, k = 1, 2,..., 


(6.5) 
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The matrix form of the two-way effects model is: 


y m 


'1 

1 

0 

... 0 

1 

0 

... 0 

1 

0 • 

• 0 

0 

... o' 


3/112 


1 

1 

0 

... 0 

1 

0 

... 0 

1 

0 • 

• 0 

0 

... 0 


’ y 
















t :1 

y lln u 


1 

1 

0 

... 0 

1 

0 

... 0 

1 

0 • 

• 0 

0 

... 0 


h 

3/m 


1 

1 

0 

... 0 

0 

1 

... 0 

0 

1 • 

• 0 

0 

... 0 


T 

y 12n 12 


1 

1 

0 

... 0 

0 

1 

... 0 

0 

1 • 

• 0 

0 

... 0 


A 
















A 

Vm 

= 

1 

1 

0 

... 0 

0 

0 

... 1 

0 

0 

1 

0 

... 0 

+ 

A, 

3/ltais 


1 

1 

0 

... 0 

0 

0 

... 1 

0 

0 ■ 

• 1 

0 

... 0 


In 

3/211 


1 

0 

1 

... 0 

1 

0 

... 0 

0 

0 • 

• 0 

1 

... 0 


y 12 

3/21n 21 


1 

0 

1 

... 0 

1 

0 

... 0 

0 

0 • 

• 0 

1 

... 0 


Tu 
















721 

y«,i 


1 

0 

0 

... 1 

0 

0 

... 1 

0 

0 ■ 

• 0 

0 

... 1 


y>b 

3/ttn® 


1 

0 

0 

... 1 

0 

0 

... 1 

0 

0 • 

• 0 

0 

... 1 



This book emphasizes models that correspond to experimental design situations rather 
than purely regression situations. The next two examples demonstrate how to construct 
such models from the data structure. 


6.1.4 Example 6.1: Means Model for Two-Way Treatment Structure 

The information in Table 6.1 represents data from a two-way treatment structure in a com¬ 
pletely randomized design structure where there are three row treatments and three 


TABLE 6.1 


Data for a Two-Way Treatment Structure in a CRD Structure 


Row Treatment 


Column Treatment 


1 

2 

3 

1 

3,6 

9 

10 

2 

2 

5,3 

8 

3 

4 

2 

6 
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column treatments and one or two observations per cell. The matrix form of the means 
model for the data in Table 6.1 is: 


1 

CO 

1_ 


'1 

0 

0 

0 

0 

0 

0 

0 

o' 

6 


1 
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0 

0 
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0 

0 

9 


0 

1 

0 

0 

0 

0 

0 

0 

0 

10 
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0 

1 

0 

0 

0 

0 

0 

0 

2 


0 

0 

0 

1 

0 

0 

0 

0 

0 

5 
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0 

0 

0 

0 

1 

0 

0 

0 

0 

co 
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0 

0 

0 

1 

0 

0 

0 

0 

00 


0 

0 

0 

0 

0 

1 

0 

0 

0 

4 


0 

0 

0 

0 

0 

0 

1 

0 

0 

2 


0 

0 

0 

0 

0 

0 

0 

1 

0 

6 


0 
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0 

0 

0 

0 

0 

0 

1 


Mu 
Ml 2 

M13 

M 21 
M 22 
M 23 
M 31 
M 32 
M 33 


+ £ 


The matrix form of the effects model for the data in Table 6.1 is 


M 

Ti 


3' 


'1 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

o' 






















To 

6 


1 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 


0 

A 

A 

A 

9 


1 

1 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 


10 
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1 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 


2 


1 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 


5 

= 

1 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

+ 

7n 

3 


1 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 


y 12 

8 


1 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 


y 13 

4 


1 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 


721 

2 


1 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 


722 

6 
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0 

0 
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0 
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0 
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0 

1 


723 




















731 


















732 


















_733 


t/=;> + X 1 r + X 2 j3 + X 3 7 +e 

where / is an 11 x 1 vector of ones corresponding to the first column of the above design 
matrix, X 1 is an 11 x 3 matrix corresponding to columns 2-4, X, is an 11 x 3 matrix corre¬ 
sponding to columns 5-7, and X 3 is an 11 x 9 matrix corresponding to the last nine columns 
of the above design matrix. 
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The design matrices for other treatment and design structures are constructed in a simi¬ 
lar fashion. Fortunately, most software that fits models to unbalanced data structures use 
the above types of representations and automatically generate the necessary columns of 
the design matrix when one specifies the categorical effects in the model. 


6.2 Least Squares Estimation 

Once the model is specified in matrix form, the next step in the analysis is to obtain the 
least squares estimator for the parameter vector p. The method of least squares can be 
used to estimate the parameters of the model. To use this method, assume that the model 
can be expressed as 


y, =/(*,; P) + e, for i = 1,2,... ,n (6.6) 

where f{x{, j8) is a function of the vector of design variables indicated by x, and depends 
on the parameter vector /?. The least squares estimator of P is the value of P, usually denoted 
by p, that minimizes the sum of squares 

n 

SS(P) = 'L[ij-f(x,-,pW (6.7) 

1=1 

If, in addition to assuming the model is of the form (6.6), one assumes that £, ~ i.i.d. 
N( 0, a 2 ), i = 1, 2,..., n, then the least squares estimate of P is also a maximum likelihood 
estimator. 

For example, the model function for the means model of a one-way treatment structure 
in a completely randomized design is 

f( x f,P) = z = l,2, ...,f; / = 1 , 2 ,..., n, 

The least squares estimators of the /a, are the values, say /z„ ..., p t , that minimize 


ss(M)=ii(y, y -M,) 2 

i=1 i= 1 

The model function for the means model of a two-way treatment structure in a com¬ 
pletely randomized design is 

/(%; P) = Hij z' = l,2,...,f; j = 1,2,...,b; fc = l,2,...,n i; - 
The least squares estimators of the iu tj are the values, say p u ,, p tb that minimize 


SS(/i) = XX£0h;*-i^) 2 

i =1 ;'=1 k =1 
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The model function for the effects model of a two-way treatment structure in a com¬ 
pletely randomized design is 

f(x ijk ; p) = p + t, + Pj + Y,j, i = X 2,..., t; j = l,2,...,b; k = 1, 2 ,... ,n tj 
The least squares estimators of p, r„ fy, and y, are obtained by minimizing 

ss(p, t=X X X (y* - ^ - T . - A, - h ,,) 2 

i=l j=l fc=l 

In general, the model can be written in a matrix form like that in Equation 6.1, and the 
least squares estimator of /? is the value /J that minimizes the sum of squares 

SS(/J) = (y- XP)'(y - Xp) (6.8) 


6.2.1 Least Squares Equations 

Matrix representations and calculus can be used to determine the value of j8 that mini¬ 
mizes the residual sum of squares. When one carries out the minimization, a set of equa¬ 
tions are obtained which j8 must satisfy and those equations are called the least squares 
equations or the normal equations of the model. The normal equations for model (6.1) are 
given by 


X'Xp = X'y (6.9) 

Any vector fi that satisfies the normal equations is a least squares estimator of /J. The 
least squares estimator need not be unique for some models. To help the reader become 
more familiar with the normal equations, the normal equations for the models discussed 
in Section 6.1 are provided below. 

The normal equations for the one-way means model are 


"«1 

0 • 

•• O' 

"A," 


>!•’ 

0 

n 2 ■ 

•• 0 

A, 

= 

Vl- 

0 

0 • 

•• n,_ 

_ A; _ 




where y t . = ^y 


V 


The normal equations for the one-way effects model are 


n. 

«i 

n 2 ■ 

•• nf 

"A" 


’y.." 


«i 

«i 

0 • 

■■ 0 

^1 


yi- 

t t 

n 2 

0 

n 2 ■ 

•• 0 

f 2 

= 

y 2 . 

where y.. = yyy () and n. = ^n t 

1=1 j= 1 i= 1 

_ n , 

0 

0 ■ 

•• n, 

_A_ 


y t .\ 
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The normal equations for the two-way means model using the data in Table 6.1 are 
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0 
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1 

0 

0 

0 

0 

0 

0 

0 

Pl2 


y i2‘ 
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0 
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0 

0 

0 

0 

Pl3 


1/ 13- 
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0 

0 

1 

0 

0 

0 

0 

0 

Pll 


J/21- 

0 

0 

0 

0 

2 

0 

0 

0 

0 

F'22 

= 

3/22- 

0 

0 

0 

0 

0 

1 

0 

0 

0 

Aba 


3/23. 

0 

0 

0 

0 

0 

0 

1 

0 

0 

/hi 


3/31- 

0 

0 

0 

0 

0 

0 

0 

1 

0 

Ah2 


y 32- 

0 

0 

0 

0 

0 

0 

0 

0 

1 

_/h 3. 


.3/33-. 


where 

k =1 


The normal equations for the two-way effects model corresponding to the data in 
Table 6.1 are 
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1 
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1 
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0 

0 
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y 2 ~ 

3 

0 

0 

3 

1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

1 

1 
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0 
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„ -j „ -j t ; 

y~=XXX Vip Vi- = XX y t p y.j. = XXiV' and Vij- =Xy# 

f=l ;=1 /c=l ;=1 /c=l i=l /c=l 


where 
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When the X'X matrix is of full rank (Graybill, 1976), that is X'X is nonsingular, then the inverse 
of X'X exists and the least squares estimator for /J (the solution for P in Equation 6.9) is 


P = (X'X) -1 X'y (6.10) 

When X'X is of full rank, the least squares estimator is unique. Computing the inverse of 
X'X is generally not an easy task. One of the most important aspects of the development of 
computing software is that now statisticians can invert very large matrices that would not 
have been attempted before computers became available. However, when there are certain 
patterns in X'X, the patterns can be exploited and the inverse can be more easily com¬ 
puted. For the normal equations for the one-way means model and the two-way means 
model X'X is diagonal (all diagonal elements are nonzero and off diagonal elements are 
zero) and the inverse of X'X is obtained by simply replacing each diagonal element by its 
reciprocal. Thus, the least squares estimators of the /./, for the one-way means model are 




' 1 

«i 

0 • 

•• 0 


7h" 
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n 2 


>!•’ 

p 2 
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•• 0 

y 2 . 

A. 


0 

0 ■ 

1 

n, 

y,-_ 


or equivalently, 

A= — = Vi. f = 1/2,..., f 

Similarly, the least squares estimator of q i( for the two-way means model is 
fi‘j = fff = x Ji)- ! = l/2,...,f, j = l,2,...,b 

Unlike the normal equations in means models, the X'X matrices for the effects models 
are singidar and the inverses of the X'X matrices do not exist. In this case there are many 
solutions to the normal equations (in fact an infinite number of least squares solutions). 
The effects models are called overspecified or singidar models in that the models have more 
parameters than can be uniquely estimated from the data collected. Overspecified models 
are commonly used and there are several ways to solve their corresponding normal equa¬ 
tions. The following discussion addresses two-way treatment structures in a completely 
randomized design structure, but similar techniques can be used in other factorial effects 
models. 

Theoretically a generalized inverse can be used to solve the normal equations for P 
(Graybill, 1976), but a commonly used method for solving the normal equations of an over- 
specified model is to place restrictions on the parameters in the model (which in effect 
generates a y-inverse solution). Placing restrictions on the parameters of the model can be 
accomplished in many ways, two of which are considered here. 
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6.2.2 Sum-to-Zero Restrictions 

One common technique is to require the sums of certain parameters to be equal to zero. 
This procedure has been used to solve normal equations from the very beginning of the 
analysis of variance. For the two-way effects model using the data in Table 6.1, the sum-to- 
zero restrictions are 

X T < = °' XA = Xr*i = o, i> l2 = °' Xho = 0 

i=l j =1 i =1 i=l i=l 

3 3 3 

I>i; =0 'X 7 2 / = 0, and £ y 3) = 0 
;=1 ;=l ;■=! 

Next, these restrictions are incorporated into the model by solving for some of the param¬ 
eters in terms of others with the restrictions being taken into account, and then substituting 
the expressions back into the model. For example, the parameters that can be replaced are 

= — T i — Tn p 3 = ~Pi~p 2 ' y 13 = y ii ya 

y — ~y — y y — ~y — y y — —-v _ q/ 

/ 23 / 21 i 11' i 31 III 111' I 31 /12 / 22 

733 = “713 ~ 723 = ~y 31 — 732 = 7ll + 7l2 + 721 + 722 

Thus, replace, t 3 , If, y 13 , y 23 , y 33 , y 31 and y 32 in the model to obtain a reparameterized 
model 

1 1 0 1 0 1 0 0 o] r n 

110101000 

1 1 0 0 1 0 1 0 0 Tl 

* 

1 1 0 -1 -1 -1 -1 0 0 t 2 

1 0 1 1 0 0 0 1 0 A’ 

10 10 1000 1 J3 2 * +e 

i o i o i o o o i 

1 0 1-1-1 0 0-1-1 * 

7l2 

1-1-10 0-10-10 * 

1-1-1 0 1 0-1 0-1 

1 -1 -1 -1 -1 1 1 1 iJL^ 22 - 



which is expressed as 

y=X*P* + £ 

The solution to the normal equations for the data in Table 6.1 corresponding to the 
sum-to-zero restrictions is 0 = (X*'X*) _1 X*'y. One obtains 


r= m k n, pi pi r u , r w n v rj 


= [5.500, 2.333,-0.833,-2.000,-0.5000,-1.333,1.667, -0.667,-0.167] 
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The estimators for the remaining elements of /? are obtained from the restrictions as 
follows: 

T* = -r* - T* = -1.500, Pi = -Pi - Pi = 2.500 
7n = -fh ~ Hi = -0-333, y *3 = -f 21 - f 22 = 0.833 
ill =-ill- ill = 2.000, r* 2 = -7*2 - 7*2 = -1 -500 

fas = ih + ih+ill + ill = -o-5oo 

In relation to the means model parameters, ju^, the parameters p* t* P* and y* can be 
selected to satisfy the sum-to-zero restrictions by defining 

p* = p.„ T* = p,-]J.., PJ = p. r p.., and =p ij -p i -p. j +p.. 

6.2.3 Set-to-Zero Restrictions 

Another reparameterization technique often used to solve the two-way effects model's 
normal equations uses restrictions that set the last parameter in each set equal to zero (the 
last parameter is selected for convenience; one could also use the first, or second, or any 
other). For the two-way effects model for the data in Table 6.1, the restrictions are: 

t 3 = 0 , p 3 = 0 , y 13 = 0 , 7 23 = 0 , 7,3 = 0 , y 31 = 0 , and y 23 = 0 

The resulting reparameterized model obtained by incorporating the above restrictions 
into the two-way effects model is 

11010100 0 " 

1 1 0 1 0 1 0 0 0 |V 
110010100 
1 1 0 0 0 0 0 0 0 t 2 + 

1 0 1 1 0 0 0 1 0 p+ 

1 0 1 0 1 0 0 0 1 p: +e 

1 0 1 0 1 0 0 0 1 7 + 

1 0 1 0 0 0 0 0 0 7 + 

1 0 0 1 0 0 0 0 0 y + 21 

1 0 0 0 1 0 0 0 0 [ 72+2 

10000000 oj 

which can be expressed as y = X + p + + e. The matrix X + is obtained from the full design 
matrix of the two-way effects model, X, by deleting the columns corresponding to t 3 , p., 
731/ 732/ 733 / Tis/ an d 7 23 - The process of obtaining the design matrix for the reparameterized 
model corresponding to the set-to-zero restrictions is much simpler than obtaining 
the design matrix for the sum-to-zero restrictions. The solution to the normal equations 
corresponding to the set-to-zero restrictions is 



p- = (X +, X + )- 1 X + 'y 
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One obtains 


/T = t 2 + , pf p:, fr„ y\ 2 , y + 21 , y 22 ] 

= [6.0,4.0,2.0, - 2.0, - 4.0, - 3.5,3.0, - 4.0,0.0] 

The estimates of the remaining parameters are zero since they were specified by the 
set-to-zero restrictions, that is. 


Tj P3 T31 y 32 y 33 To T23 0 

To relate the set-to-zero restrictions to the mean model parameters, ^ define 

// + , tPf and y* 

x-=n ib -n th pj =!i tj ~H, b and y* = n tj - n tj ~ H a + Ha 

Thus there are several possible solutions to the normal equations when X'X is not of full 
rank (that is, when X'X is singular). This occurs because the model is overparameterized; 
that is, there are more parameters in the model (16 in the case of the two-way effects 
model) than can be uniquely estimated from the available data (there are nine cells of data, 
so one can estimate at most nine parameters). The number of parameters that can be esti¬ 
mated uniquely might be called the number of essential parameters. To cope with the over¬ 
parameterized model and nonunique least squares solutions, the concept of estimability 
must be considered, which is the topic of Section 6.3. The next example presents the two 
possible solutions for the effects model for a one-way treatment structure. 


6.2.4 Example 6.2: A One-Way Treatment Structure 

This is an example of a one-way treatment structure with four treatments in a completely 
randomized design structure. The data are shown in Table 6.2. The X * matrix is constructed 
by reparameterizing the model by using the sum-to-zero restrictions; that is, it is assumed 
that Tj + t* 2 + t* + T* = 0. 


TABLE 6.2 


Data for One-Way Treatment Structure for Means and Effects 
Models in Section 6.2.4 


Treatment 1 

Treatment 2 

Treatment 3 

Treatment 4 

2.2 

2.4 

1.8 

1.9 

2.4 

2.6 

1.7 

2.0 

2.5 

3.0 

1.8 

2.3 

2.3 

3.1 

1.6 

2.1 

2.0 

2.5 

1.4 

1.9 

1.9 


1.9 

2.0 

1.9 



2.4 




1.5 
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The resulting design matrix is: 


X 


110 0 

110 0 

110 0 

110 0 

110 0 

110 0 

110 0 

10 10 

10 10 

10 10 

10 10 

10 10 

10 0 1 
10 0 1 

10 0 1 

10 0 1 

10 0 1 

10 0 1 

1 -1 -1 -1 

1 -1 -1 -1 

1 -1 -1 -1 

1 -1 -1 -1 

1 -1 -1 -1 

1 -1 -1 -1 

1 -1 -1 -1 

1 -1 -1 -1 


The normal equations for the sum-to-zero restriction model are 
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-0.9 

-3 
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-2 

8 

8 

14 



-5.9 


The corresponding least squares solution to the sum-to-zero restriction normal equations is 


>' 


' 2.1510' 



0.0204 



0.5690 

_ f 3_ 


-0.4510 
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and t*= -f*- T*~ -0.1384. In terms of the means model parameters, the sum-to-zero 

restriction parameters can be expressed as 

H* = f J./ *t = Hl~H.' *2 = H2-U: T *3 = H3~ H-, t\= Hi-fi. 

The set-to-zero restriction design matrix, X + , with rj = 0, is constructed from X* by 
replacing the "-1" values with "0" values. The normal equations for the set-to-zero restric¬ 
tion model are 


26 7 5 6' 

> 1 


'55.1' 

7 7 0 0 

A + 


15.2 

5 0 5 0 

% 


13.6 

6 0 0 6 

_A + 1 


10.2 


The least squares solution for the set-to-zero restriction model is 


~H + ] 


' 2.0125' 

A + 


0.1589 

f 2 + 


0.7075 



-0.3125 


and f J = 0. In terms of the means model parameters, the set-to-zero restriction parameters 
can be expressed as 

M + = At, = Hi ~ Hi, n = H2~Hi , n = Hs ~ Hi, tX = Ha~Ha=^ 

The last parameter that needs to be estimated is the population variance a 2 . An estimate 
of a 1 based on the least squares solution for /} is 


o 2 = — (y-Xpny-Xp) 

n-r 

= (Vi — Po ~ A x n ~ A-'-iZ 

n-rti 


Pp-i x ip-i) 


( 6 . 11 ) 


where r = rank(X). 

If the errors are assumed to be independently distributed with the first four moments 
equal to the first four moments of a normal distribution, then a 2 is the best quadratic unbi¬ 
ased estimate of a 2 . If the errors are also normally distributed, then a 2 is the best unbiased 
estimator of a 2 and the sampling distribution of (n - r)a 2 /o 2 is a central chi-square distri¬ 
bution with n-r degrees of freedom. 


6.3 Estimability and Connected Designs 

When an overspecified model or a less than full rank model is used for an experimental situ¬ 
ation, there are many different least squares solutions (in fact there are an infinite number of 
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solutions). If two researchers analyzed the above two data sets, one using the sum-to-zero 
restriction and the other the set-to-zero restriction, they might appear to obtain different 
conclusions. For the two-way example in the last section, t* = -0.833 and it = 2.000; thus 
one researcher might say that r 2 is most likely to be negative while the other might say that 
t 2 is likely to be positive—and both statements are incorrect. 


6.3.1 Estimable Functions 

Since both researchers are analyzing the same data set, it seems that they should consider 
only parameters or functions of the parameters that have identical estimates for both 
reparameterized models. Such functions of the parameters are called estimable functions of 
the parameters. 

Definition 6.3.1: A parameter or function of the parameters/(j8) is estimable if and only 
if the estimate of the parameter or function of parameters is invariant with respect to the 
choice of a least squares solution; that is, the value of the estimate is the same regardless of 
which solution to the normal equations is used. 

If two researchers obtain an estimate of an estimable function of the parameters, they 
both will obtain the same value even if they have two different least squares solutions. 
Therefore, they will make the same decisions about estimable functions of the parameters. 
For matrix models, linear estimable functions of /? take on the form of linear combinations 
of the parameter vector such as a '/J where a is a p x 1 vector of constants. A linear function 
a'13 is estimable if and only if there exists a vector r such that a = X'Xr. Each function x'f is 
estimable where x, is the ith row of X. Also, any linear combination of the x'/J's is an esti¬ 
mable function. Consider the two solutions obtained for the one-way example in Section 
6.2. Because there are two different solutions for each of the parameters p, x v t 2 , t 3 , and t 4 , 
these parameters are considered to be nonestimable, but by computing the estimate p + T, 
from each method, it is seen that 


p* + i* = p + + if i = 1,2, 3, 4 

demonstrating that p + x, is an estimable function of the parameters. 

All contrasts of the r„ such as the differences x 1 -t 2 , t, -t 3 , t 2 -t 3 , or X’ =) c,T, where Z',,c, = 0 
can be shown to be estimable functions for the one-way model. 

For the two-way effects model, some estimable functions are 

p + ti + If + y„, Yy - y ir - Yi-j + Y n -, If ~ P r + Y~ Y . r , T - x, + f- y„ 

Estimable functions are discussed in more detail in Chapter 10. The important thing to 
remember here is the definition of an estimable function. In making inferences from a data 
set, one must consider only functions of the parameters that are estimable, since they are 
functions of the parameters with estimates that do not depend on which least squares 
solution is chosen. SAS®-GLM and SAS®-Mixed both check to make sure that a parameter 
or function of the parameters that is being requested to be estimated is in fact estimable. 
If the parameter or function of the parameters is not estimable, no estimate is provided. 
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I. Connected 


Row 

treatments 


Column treatments 


X 




X 


X 



X 



X 


X 




X 

X 


Row 

treatments 


II. Not connected 
Column treatments 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


“X” denotes an observed treatment combination 


FIGURE 6.1 Connected and unconnected two-way treatment structures. 


6.3.2 Connectedness 

Another concept related to estimable functions is that of connectedness of a two-way treat¬ 
ment structure. If it can be assumed that the levels of the row and column treatments do 
not interact, then the treatment combination means can be modeled as 

Hij = H + % + Pj, i = 1,2,... ,b, j = 1,2, ...,t (6.12) 

A two-way treatment structure is said to be connected if and only if data occur in the 
two-way cells in such a way that /i, - [f and r, - xy are estimable for all ( j ^ j') and (/' ^ i') for 
model (6.12). Arrangement I in Figure 6.1 is a connected experiment, while arrangement II 
is not. For example, using arrangement I, 

A - A = (li + h + A) “(b + h + A) + (A* + r 2 + A) - (ju + r 2 + A) 

a linear combination of the cell means, thus A - A is estimable. No such linear combina¬ 
tion of the cell means for arrangement II provides A - A, thus A - A is not estimable. The 
next section discusses the testing of hypotheses about estimable functions of the 
parameters. 


6.4 Testing Hypotheses about Linear Model Parameters 

There are several ways to develop appropriate statistics for testing hypotheses about linear 
functions of the parameters of a linear model. The method used here, which is expressed 
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in matrix notation, is equivalent to the principle of conditional error (Chapter 1) and the 
likelihood ratio statistic. The discussion is limited to testing hypotheses about estimable 
functions of the parameters. In particular, consider testing the hypothesis 

H 0 : Hp = h vs H„: Hp ± h (613) 

where the linear combinations HP are estimable functions of /J and H is a q x p matrix 
of rank q (that is, all of the rows of H are linearly independent). The corresponding test 
statistic is 


F = 


SSH g /q 


(6.14) 


where <r 2 was given by Equation 6.11 and 

SSH 0 = (HP - h) ' [H(X'X) H'}-' (Hp - h) (6 ' l 5) 

which is called the sum of squares due to deviations from the null hypothesis [the notation 
"(X'X) " denotes a generalized inverse of the matrix X'X, Graybill, 1976]. Under the assump¬ 
tion that the elements of the error vector are i.i.d. N( 0, a 2 ), F, is distributed as an F-distribution 
with q and n - r degrees of freedom. 

The hypothesis in Equation 6.13 can always be equivalently stated in terms of a reparam¬ 
eterized model y = X*P* + e where X*'X* is nonsingular, as 

H 0 : H*p* = if vs H- H*P* * If 

Then the SSH 0 of Equation 6.15 can be computed as 

SSH 0 = (H*P* - hffHfX^Xf-fH*'^ - if)] 

For a one-way model, testing the hypothesis 

H 0 : Tj = r 2 = • • • = T, vs H a : t, ^ T r for some i ^ V 
in the original effects model is equivalent to testing 


H 0 :t* =T* = ■■■ = T*_, = 0 vs H a : r* * 0 for some i < t- 1 


in the sum-to-zero reparameterized model. The null hypothesis in terms of P* for 
Example 6.2 is 




0 10 0 
0 0 10 
0 0 0 1 


= 0 or H*P*= 0 
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and the sum of squares due to deviations from H g : is 


SSH 0 =(t* t* t*)Z _1 


where Z = H*(X*'X*) 1 H*', which is the portion of (X*'X*) 1 corresponding to the rows and 
columns associated with (t*, T*, T*). 

A (1 - a)100% confidence interval about an estimable function a'pis 

ftp — [l a/2, n-p]^a'p — a P— a P + \fx/2, n-p]^a'p 

where Sfp = a 2 a'(X'X) 1 a. Simultaneous confidence intervals can be constructed about 
several estimable functions by using one of the multiple comparison procedures discussed 
in Chapter 3. 


6.5 Population Marginal Means 

After analyzing a cross-classified data set via an analysis of variance, the experimenter is 
usually interested in estimating the means of particular effects or cells. The population 
marginal mean is defined to be a linear combination of the parameters averaged over speci¬ 
fied classes as if there were one observation in each cell (Searle et al., 1980). If every cell has 
at least one observation, then all population marginal means are estimable, whereas they 
are not necessarily estimable if some cells are empty. This definition does not depend on 
the sample sizes in the cells. If the data represent a proportional sampling of cells, then the 
experimenter might want to consider a weighted average of the cell means where the 
weights are given by the sample sizes (see Chapter 10). Or there could be some other 
weighting scheme that needs to be used to average over the various cells. An example is 
presented at the end of this section to demonstrate different possibilities. 

For a one-way treatment structure in a completely randomized design, the population 
marginal mean for the z'th treatment is ,li + r , =and is estimated by 

P^ r r l = P + f, 

where p and f, are obtained from any solution to the normal equations. These estimated 
values are called estimated population marginal means. 

For a two-way treatment structure in a completely randomized design, the population 
marginal mean for the (z, /)th cell is p :! = p + T, + ft + J I} . The population marginal mean for 
row i is the average of the q, ( in that row, or 

— -V-i Ml j 7) _ 

Mi. =2wiy = M + T + j3-+7i. 

H h 
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The population marginal mean for column j is the average of the in that column, or 

V,,=L^ = ! u + *- + P j + Y, l 

The estimates of the population marginal means are 

A,) = A + T + A/+ 7 ,)' 

„ b B b Y 

A,, + 

b p b 

and 

A-; = A + Xt _ + A/ + St _ ' respectively. 

;=i ‘ i=i ‘ 

These estimates are unique for any of the possible least squares solutions 

A/ A' r 2 / ••• /T, A, A 2 / ••• 'A' Tin 7i2' ••• 'Ytb 

The estimate of the population marginal mean jl v for the two-way example in Table 6.1, 
computed from the sum-to-zero restricted model, is 

— "* "* A + A + A 7n + 7i2+7l3 

M 1 3 3 

rrnn o ooo -2.000-0.500 + 2.500 -1.333 + 1.667-0.333 

= 5.500 + 2.333 +-1- 

3 3 

= 7.833 


One obtains the same value for jl 1 . when the set-to-zero least squares solution is used as 
expected, since jJ h is estimable for this example. 

When there are no empty cells, then all population marginal means are estimable. If 
there are empty cells, then any population marginal mean involving one or more of the 
missing cells is not estimable. For example, if the (2, 2) cell is missing in a 2 x 2 treatment 
structure, there is no information about p 22 and hence p 22 is not estimable. The population 
marginal mean for column 2 is 


_ Mi2 + M 22 

M-2 “ 2 

Because p. 2 depends on p 22 , it follows that j]. 2 is not estimable (jl 2 . is not estimable either). 

Any population marginal mean that is estimable can be expressed as a linear combina¬ 
tion of the elements of the parameter vector of a reparameterized model; that is, the popu¬ 
lation marginal mean can be expressed as a'/5* for a proper choice of a. The variance of the 
estimated population marginal mean is 

Var (a'fi*) = cr 2 a'(X*'Xf 'a 
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and the estimated standard error of a'P is 

S£.(a'P*) = 6fa'(X*' X*)'a 


For the one-way model with the set-to-zero restriction, the estimated population mar¬ 
ginal means are 


H + T ; = fi* + f * i = 1,2,... ,f-l 


and 


H + T, = ft* 


The variance of these estimated population marginal means are 

Var( ff+ff = Var( fi*) + 2Co v(/i*, t*) + Var(r*) i = 1,2,..., t -1 


and 


Var^ + T t ) = Var (fi*) 

The data in Table 6.3 are grade point averages (GPA) of a sample of students from a 
school where the students were classified by year in school and sex. The two-way model 

y,jk = M+F + P, + y,j + £ijk i = 1,2,3,4, ; = 1,2, * = 1,2,..., n, y 

is used to describe the data where the y ijk are the GPA values, T , is the effect for the /th year, 
[f is the effect of the j th sex, and y, is the interaction effect. The set-to-zero least squares 
solution for the parameters is displayed in Table 6.4. The estimated standard error is zero 
for those parameters that are set to zero. The estimate of the cell mean for year 1 and 
female is 


fi v =fi + f + p f + y lf = 3.333 - 0.133 - 0.033 + 0.205 = 3.372 


and the remaining cell means can be computed similarly, and they are displayed in Table 
6.5. Estimates of the marginal year and sex means are provided in Tables 6.6 and 6.7. The 


TABLE 6.3 


Grade Point Average Data for Two-Way Treatment Structure 


Freshmen 

Sophomores 

Juniors 

Seniors 

Female 

Male 

Female 

Male 

Female 

Male 

Female 

Male 

3.3 

2.3 

3.3 

3.1 

3.2 

2.9 

3.7 

3.4 

3.5 

3.2 

3.8 

3.9 

3.5 

3.3 

2.9 

3.8 

3.8 

3.4 

2.3 

2.8 

3.9 

3.6 

— 

2.8 

3.7 

3.8 

3.0 

3.4 

3.3 

— 

— 

— 

2.8 

3.3 

3.3 

3.5 

— 

— 

— 

— 

— 

3.2 

— 

— 

— 

— 

— 

— 

— 
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TABLE 6.4 


Least Squares Solution for GPA Data from Two-Way Treatment Structure 


Parameter 

Estimate 

Standard Error 

Intercept 

3.333 

0.264 

Year 1 

-0.133 

0.334 

Year 2 

-0.033 

0.349 

Year 3 

-0.067 

0.373 

Year 4 

0.000 

— 

Sex f 

-0.033 

0.417 

Sex m 

0.000 

— 

Year x sex 1 f 

0.205 

0.498 

Year x sex 1 m 

0.000 

— 

Year x sex 2 f 

-0.087 

0.518 

Year x sex 2 m 

0.000 

— 

Year x sex 3 f 

0.242 

0.544 

Year x sex 3 m 

0.000 

— 

Year x sex 4 f 

0.000 

— 

Year x sex 4 m 

0.000 

— 


TABLE 6.5 

Cell Means with Estimated Standard Errors 

Year 

Sex 

Cell Size 

Cell Mean 

Standard Error 

1 

f 

7 

3.371 

0.173 

1 

m 

5 

3.200 

0.204 

2 

f 

5 

3.180 

0.204 

2 

m 

4 

3.300 

0.229 

3 

f 

4 

3.475 

0.229 

3 

m 

3 

3.267 

0.264 

4 

f 

2 

3.300 

0.323 

4 

m 

3 

3.333 

0.264 


TABLE 6.6 


Raw, Least Squares, and Weighted Means for the Years Averaged over 
Levels of Sex 


Year 

Raw Mean 

LS Mean 

Weighted 

1 

3.300 

3.286 

3.257 

2 

3.233 

3.240 

3.245 

3 

3.386 

3.371 

3.357 

4 

3.320 

3.317 

3.319 
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TABLE 6.7 


Raw, Least Squares, and Weighted Means for the Females 
and Males Averaged over Levels of Year 


Sex 

Raw Mean 

LS Mean 

Weighted 

f 

3.333 

3.332 

3.327 

m 

3.267 

3.275 

3.261 


raw means are weighted averages of the respective cell means using the numbers of obser¬ 
vations in the cells as weights. For example, the raw mean or unadjusted mean for year 1 
is computed as 


k.= 


7*3.371 + 5*3.200 
7 + 5 


3.300 


The least squares means for the year and sex effects are the unweighted averages of the 
respective cell means, and the least squares mean for year 1 is 


k.= 


3.371 + 3.200 
2 


3.286 


When making statements about the marginal year or marginal sex effects one may be 
interested in using either the raw means (weighting by observed cell size) or the least 
squares means that weight each cell mean equally (as if there were equal cell sizes). For 
designed experiments, the least squares means are likely to be the means of interest 
because you most likely designed the experiment with equal numbers of observations per 
cell, and so providing estimates of the marginal means as if the cell sizes were equal is a 
reasonable solution, even though some of the data may be missing. But if the data are from 
an observational study, the unweighted means may not be the marginal means of interest. 
If the data were from a simple random sample from the population, then the cell sizes may 
reflect the proportional membership structure of the population. If that is the case, the raw 
means or means weighted by sample sizes are the marginal means of interest. But if the 
sample sizes in the cells are not representative of the population structure, then neither the 
least squares means nor the raw means are of interest. This type of phenomenon occurs 
when some segments of the population are over- or undersampled by design or by chance. 
When the population structure is known, that is, the proportion of the population in each 
cell is known, the marginal means of interest are obtained by using the known population 
proportions as weights. For example, suppose the proportions of males and females in 
each of the years of study are as given in Table 6.8. The estimated marginal mean for year 
1 using the weights is 


K 


11*3.371 + 22*3.200 


3.257 


33 
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The estimated marginal mean for females averaged over years using the weights is 

2 11 * 3.371 +12 * 3.180 +10 * 3.475 + 8 * 3.300 „ „„„ 

u f =-= 3.327 

'v At 


The estimated marginal mean for males averaged over years using the weights is 




22 * 3.200 +14 * 3.300 +13 * 3.267 +10 * 3.333 
59 


3.261 


It is not a problem to compute several types of adjusted means and then select those that 
you like, but it is necessary to specify the proportions in the population structure as correctly 


TABLE 6.8 


Population Distribution of Students to Classes for the Grade Point 
Average Study 



Freshmen 

Sophomores 

Juniors 

Seniors 

Total 

Females 

11% 

12% 

10% 

8% 

41% 

Males 

22% 

14% 

13% 

10% 

59% 

Total 

33% 

26% 

23% 

18% 



TABLE 6.9 

SAS®-GLM Code with Estimate Statements to Provide Estimates of the Population 
Marginal Means for Levels of Sex and Year Using the Weights in Table 6.8 

proc glm data=ex6_4; class year sex; 
model gpa=year sex year*sex/solution; 
lsmeans year|sex/stderr; 
means year|sex; 

***assume the following is the population structure 

***sex year=l 234 sum 

***female 11 12 10 8 41 

***male 22 14 13 10 59 

*** sum 33 26 23 18; 

estimate 'female pop' intercept 41 sex 41 0 year 11 12 10 8 
year*sex 11 0 12 0 10 0 8 0/divisor=41; 

estimate 'male pop' intercept 59 sex 0 59 year 22 14 13 10 
year*sex 0 22 0 14 0 13 0 10/divisor=59; 
estimate 'year 1 pop' intercept 33 sex 11 22 year 33 
year*sex 11 22 0 0 0 0 0 0/divisor=33; 

estimate 'year 2 pop' intercept 26 sex 12 14 year 0 26 0 0 
year*sex 0 0 12 14 0 0 0 0/divisor=26; 

estimate 'year 3 pop' intercept 23 sex 10 13 year 0 0 23 0 
year*sex 0 0 0 0 10 13 0 0/divisor=23; 

estimate 'year 4 pop' intercept 18 sex 8 10 year 0 0 0 18 
year*sex 0000008 10 /divisor=18; 
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as possible and then compute the estimated marginal means using those weights. This can 
be accomplished by software packages that allow the use of an "estimate" statement where 
the estimate and the estimated standard error of the desired marginal mean are provided. 
Table 6.9 contains code to use the SAS-GLM procedure to provide the estimates of the 
marginal means using the proportions in Table 6.8 as weights. 


6.6 Concluding Remarks 

This chapter, required only for those interested in a theoretical background, introduced 
least squares estimation procedures and discussed the important concept of estimability. 
Also discussed were the definition and estimation of population marginal means. This 
chapter provides general formulas for those who want to develop statistical software for 
their own use. 


6.7 Exercises 

6.1 For the following data set fit the model y fj - fi+z i + [f + e, ( , i = 1, 2, 3 and j = 

U 2, 3. 

1) Obtain the estimates of the parameters that satisfy the set-to-zero solution. 

2) Obtain the estimates of the parameters that satisfy the sum-to-zero solution. 

3) Use the two solutions in parts 1) and 2) to verify that ji+ x 2 + is likely to be 
estimable. 

4) Use the two solutions in parts 1) and 2) to verify that T, - t, is likely to be 
estimable. 

5) Use the two solutions in parts 1) and 2) to verify that - [F is likely to be 
estimable. 

6) Use the two solutions in parts 1) and 2) to show that the row treatment and 
column treatment marginal means are estimable. 

7) Discuss the estimability of ji, [f, and z 3 . 



Column Treatment 1 

Column Treatment 2 

Column Treatment 3 

Row treatment 1 

15 

22 

18 

Row treatment 2 

19 

24 

24 

Row treatment 3 

21 

27 

23 


6.2 1) Show that arrangement I in the following table is connected. 

2) Show that arrangement II in the following table is not connected. 
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Arrangement I 


Arrangement II 

X X 


X 

X 

X 

X 


X X 

X 

X 

X 

X 

X 

X 

X 

X 


"X" denotes that the treatment combination was observed, rows are row treat¬ 
ments, and columns are column treatments. 

6.3 For the following data set fit the two-way effects model 

y,jk = p + T;■ + Pj + Yij+ % i = 1, 2,3, i = 1, 2,3, k = 0,1, or 2 

by obtaining a solution to the normal equations. 

1) Obtain estimates of the row treatment marginal means, column treatment 
marginal means, and the cell marginal means. 

2) Use the estimates of the cell marginal means from part 1) to verify that the 
row treatment marginal means are computed as if there is only one observa¬ 
tion per cell. 

3) Use the proportions in the following population structure table to obtain 
row treatment and column treatment weighted marginal means. 

4) Discuss the difference between the results from parts 2) and 3) for row 2 and 
column 3. 

5) Obtain an estimate of the difference between row 1 and row 3 and construct 
a 95% confidence interval about the difference between the two row means. 
Use information for both the least squares means approach and the weighted 
means approach. 

Data for Exercise 6.3 


Column Treatment 1 Column Treatment 2 Column Treatment 3 


Row treatment 1 

15,13 

22,19 

18,20 

Row treatment 2 

19 

24, 26 


Row treatment 3 

21,22 

27 

23,23 

Population Structure Proportions for Exercise 6.3 


Column Treatment 1 

Column Treatment 2 

Column Treatment 3 

Row treatment 1 

15% 

10% 

5% 

Row treatment 2 

20% 

15% 

0% 

Row treatment 3 

10% 

15% 

10% 




Balanced Two-Way Treatment Structures 


Chapters 4 and 5 discussed how to analyze the design structure of an experiment. This 
chapter will discuss methods of analyzing the treatment structure of an experiment. In 
particular, this chapter will emphasize the analysis of a two-way cross-classified treat¬ 
ment structure. Suppose there are two sets of treatments T v T 2 ,...,T t and B v B 2 ,, B b . Each 
one of the T-treatments is to be combined with each one of the B-treatments and assigned 
to an experimental unit. For convenience, assume that the experimental units are assigned 
to the treatment combinations completely at random. Alternatively, suppose a survey is 
taken randomly from a large set of experimental units and that experimental units in the 
sample can be assigned to categories according to the values of T v T 2 ,...,T t and B v B 2 ,..., B h . 
In either of these two situations, a total of bt populations are sampled in a cross-classified 
treatment structure. 

Analyzing the treatment structure and analyzing the design structure of an experiment 
are usually performed independently except for split-plot and repeated measures designs 
and variations of these considered in later chapters of the book. Thus, it makes little differ¬ 
ence whether the experimental units are grouped into complete blocks, balanced incom¬ 
plete blocks, Latin squares, or some other grouping as the analysis of the treatment structure 
is similar for most standard designs. Since bt populations are sampled, there are bt - 1 
degrees of freedom in the sum of squares for testing the hypothesis of equal treatment 
means, i.e. the hypothesis that lu n = fJ. 12 = --- = fi ti . This chapter considers different parti¬ 
tions of the treatment sum of squares to test different kinds of hypotheses that are usually 
of interest to experimenters. 


7.1 Model Definition and Assumptions 

7.1.1 Means Model 

The means model is defined by 

Y iJk = li ij + e ijk , i = l,2,...,t, j = l,2,...,b, k = l,2,...,n (7.1) 
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where it is assumed that e ijk ~ i.i.d. N( 0, a 1 ), i = 1,2, ..., t, j = 1,2, ... / b,k = l / 2 / ... ,n and p tj 
is the response expected when treatment combination (T„ By) is assigned to a randomly 
selected experimental unit. 


7.1.2 Effects Model 

Another common model in experiments having a two-way treatment structure is known 
as the effects model, which is defined by Equation 7.1 with p^ replaced by 

p tj = p + Ti + Pj+Yij, z = l,2, ...,t, ; = 1,2, ...,b (7.2) 

Philosophically, the effects model is intuitively appealing because it might be motivated 
by assuming that p represents some overall mean effect, r, represents the effect on the 
mean as a result of assigning T, to the experimental unit, /j ; represents the effect of assign¬ 
ing Bj to the experimental unit, and y, represents any additional effects, either positive or 
negative, that might result from using both T, and B ; at the same time on the experimental 
unit. However, the effects model presents some estimability problems, as pointed out in 
Chapter 6. Both models are discussed below. 


7.2 Parameter Estimation 

In most cases, the estimate of cr 2 is derived solely from the analysis of the design structure 
of an experiment. However, in some instances the experimenter may be able to make 
certain assumptions about the p tj or the treatment effects that might provide some addi¬ 
tional information about cr 2 . For example, if the experimenter knows that p u = p ]2 , then the 
samples from the populations with means p n and p ]2 could be combined to provide an 
additional degree of freedom for estimating a 2 . In this case, the corresponding single 
degree of freedom sum of squares is given by n(y n . - y 12 .) 2 /2. 

Most experimenters would not tend to believe that p u = p ]2 , but there are other beliefs 
that might be appropriate. One such belief is that the effects of the two sets of treatments 
are additive; that is, the two sets of treatments do not interact. If this were true then 
additional information would be available for estimating a 2 . 

All of the discussion provided in Chapters 1-3 for the one-way treatment structure apply 
to the two-way treatment structure if one considers the bt combinations as bt different 
treatments; that is, [p n , p l2 ,..., p lb , ..., p n , p !2 ,..., p tb \ = [p lr p 2 , ..., p tb \, say. 

Consequently, the best estimates of the parameters in the means model are 

Bij— ijk~ yij.’ i — l/2,...,f, j — l,2,...,b 

n k =1 


1 

N-bt 


ijk 


and 
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where N = nbt. The sampling distribution of q, 7 is N(/n ijr a 2 /n) for i = 1, 2,..., t, j = 1,2, ... ,b 
and the sampling distribution of (N - bt)d 2 /o 2 is X 2 (N - bt). In addition, q 12 , ..., fl tb and 
a 2 have independent probability distributions. 

Most often an experimenter will want to answer the following questions: 


1) How do the T-treatments affect the response? 

2) How do the B-treatments affect the response? 


7.3 Interactions and Their Importance 

In order to give good answers to the above questions, the experimenter must first deter¬ 
mine whether these two sets of treatments interact. The interaction hypothesis can be 
stated in several equivalent ways. Some of these are given below. 

1) H 0 : Hij - q, 7 = q,- ; - q, T , for all i * V and j * 

2) H 0 : - n n = n, r - Hr r , for all i ± i' and 

3) H 0 : n t , ~ Hi-j ~ /J-ij' + Hrf = 0 for all i ± i' and j ± j', 

4) H 0 : = jJ + T, + pj for all i and j for some set of parameters q, q, t 2 , ..., T„ /!,, p 2 , 

Each of 1-4 implies that there is no interaction between the two sets of treatment effects. 
The interpretation of 1 is that the difference between any pair of B-treatments is the same 
regardless of which T-treatment they are combined with. Similarly, the interpretation of 2 
is that the difference between any pair of T-treatments is the same regardless of which 
B-treatment they are combined with. Both 1 and 2 are algebraically equivalent to 3, which 
is often referred to as the set of all possible two by two table differences. The interpretation 
of 4 is that the effects of the two sets of treatments are additive. If any of 1-4 is true, then it 
is stated that there is no interaction between the T-treatments and B-treatments. 

If the two sets of treatments do not interact, then the effects of each set of treatments can 
best be compared after averaging over the effects of the second set of treatments. Such a 
comparison is best in the sense that averaging provides more power for comparing the 
effects of two or more treatments, or equivalently, averaging gives the shortest possible 
confidence intervals on effect differences. If the two sets of treatments interact, then 
differences between the effects of one set of treatments depend on the level of the second 
treatment set with which they are combined, and the analysis of the experiment is slightly 
more complex. 


7.4 Main Effects 

If the experimenter concludes that the two sets of treatments do not interact, then hypo¬ 
theses about the main effects can be tested. These hypotheses can be written as: 

H 0 i:qj.=q 2 . = •••=/!,. and H (n : q., =q. 2 = =q. f , 
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Even if there is interaction in the experiment, the above two hypotheses can still be 
tested. However, the interpretations of the results of the tests in these two situations will 
be quite different. 


7.5 Computer Analyses 

Most statistical analysis packages automatically give the tests for the two main effects 
hypotheses and the interaction hypothesis described in the preceding two sections 
provided that one specifies a model of the form 

y=TBTxB 

Most also have an option that allows the user to compare the main effect means to one 
another by using one or more of the multiple comparison procedures discussed in Chapter 3. 
Most packages also allow the user to specify and test contrasts of the user's own choosing. 

If it is determined that the two sets of treatments interact, then the experimenter may 
want to compare the differences between all pairs of the bt treatment combinations. This 
can be done by hand if such comparisons cannot be made by the statistical package being 
used. Alternatively, if the statistical analysis package does not allow multiple comparisons 
of the T x B cell means, it can often be tricked into doing so. To do this, one can include a 
new identification variable in the data file so that the new variable takes on bt different 
values, one for each of the bt treatment combinations. This new variable can be used to 
reanalyze the data as a one-way treatment structure, thus yielding multiple comparisons 
on the two-way cell means. One may also be interested in adjusting for carrying out 
multiple tests as described in Chapter 3. 

In the next chapter, a case study is considered that illustrates the concepts discussed in 
this chapter. 


7.6 Concluding Remarks 

There are three basic preliminary hypotheses that are often tested when the treatments are 
arranged in a two-way treatment structure. The most important of these is the interaction 
hypothesis. If there is no interaction, then the main effects of each of the treatments can 
best be compared by averaging over the levels of the other treatment. If there is interaction, 
then the experimenter must be careful to determine whether it makes sense to average 
over the levels of the second treatment when comparing the effects of the first treatment. 
More often than not, it does not make sense. 


7.7 Exercises 


7.1 The following data are from a two-way treatment structure in a completely 
randomized design structure with three replications per treatment combination. 
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1) Estimate the parameters of the means model, y ijk = jJ,, + £ i]k , i = 1, 2, 3 and j = 
1, 2, 3,4, and k= 1, 2, 3. 

2) Estimate the parameters of the effects model, y ijk = jJ + f, + /}, + y :; + i = 1, 2, 
3,4 and j = 1,2,3 and k = 1,2,3. 

3) Use contrast statements with the means model to provide the sum of squares 
for row treatments, sum of squares for column treatments, and sum of squares 
due to interaction. 

4) Use contrast statements with the effects model to provide the sum of squares 
for row treatments, sum of squares for column treatments, and sum of squares 
due to interaction. 



Column 

Treatment 1 

Column 
Treatment 2 

Column 
Treatment 3 

Column 
Treatment 4 

Row treatment 1 

78, 74, 75 

85,86,86 

80, 82, 79 

89,87,87 

Row treatment 2 

79, 82, 82 

89,87,86 

87, 83, 84 

81,83,80 

Row treatment 3 

75, 72, 74 

80,83,85 

76, 79, 78 

91,93,92 





Case Study: Complete Analyses of Balanced 
Two-Way Experiments 


In the preceding chapter, it was assumed that, when the T-treatments and the B-treatments 
interact in an experiment, the experimenter will want to compare the effects of the 
T-treatments at each possibility for one of the B-treatments, or vice versa. In many instances, 
interaction does not occur everywhere in the experiment—often just one or two treatment 
combinations are responsible for the interaction. In other cases, one possibility for one of 
the treatments may interact with the possibilities for the second treatment, while all other 
possibilities of the first treatment do not interact with any possibilities of the second. 

In order to conduct a more complete analysis of data with interaction, it is helpful to 
determine where the interaction occurs in the data. For example, if it is known that all of 
the interaction in an experiment is caused by only one of the possibilities (often a control) 
of the T-treatments, then all of the remaining possibilities of the T-treatments could still be 
compared after averaging over all of the possibilities of the B-treatments. This results in 
more powerful tests for comparing the possibilities of the T-treatments that do not interact 
with the B-treatments. 


8.1 Contrasts of Main Effect Means 

Very often the possibilities of the T-treatments and the possibilities of the B-treatments 
suggest main-effect contrasts that would be particularly interesting to an experimenter. 
Such main effect contrasts suggest special types of interaction contrasts that should also be 
interesting for the experimenter. Furthermore, such interaction contrasts are often easy to 
interpret. Next main effect contrasts and orthogonal main effect contrasts are defined. 

Definition 8.1: A linear combination of the /I,., S* =1 c,/Z,., is called a contrast in the T main 
effect means if X, =1 c, = 0. Likewise, a linear combination of the jJ. jr X,-, d, jl^, is called a 
contrast in the B main effect means if S 7=1 d ; = 0. 
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Definition 8.2: Two contrasts, X,L, and c'/T. are called orthogonal contrasts in the 
T main effect means if X, =1 c,c' = 0. Similarly, two contrasts, X H djjl.j and X d'/Z., are called 
orthogonal contrasts in the B main effect means if X,_, djd' = 0. 

Now suppose that 


^ T j ^ ^ C i2^i-' ■ ■ ■ ’ ^ 

I /=1 i=l i=l J 

is a set of t - 1 orthogonal contrasts in the T main effect means and that 


Sb — j ^ ^ dj 2 p.j ,..., ^ j/i. ; l 

[ /=1 ;=1 ;=i J 

is a set of Z? — 1 orthogonal contrasts in the B main effect means. Each of the sets S T and S B 
suggests a partitioning of the two main effect sums of squares. That is. 


P = l,2.. t- l] 


( 8 . 1 ) 


defines a partitioning of the sum of squares for T and 


s; = q 0 2 = 






2 

j n 


, q = l,2,...,b - 1 


( 8 . 2 ) 


defines a partitioning of the sum of squares for B. That is, each Q 2 p in S* is a single-degree- 
of-freedom sum of squares used for testing whether the corresponding contrast in the T 
main effect means is equal to zero. Furthermore, the sum of the t -1 single-degree-of- 
freedom contrasts in S* is equal to the sum of squares for testing H m : fp. = ,0 2 . = ■■■ = ,0,.. 
A similar situation exists for the elements of S B . 

One should not overemphasize the desirability of obtaining orthogonal partitions of the 
basic sums of squares. Orthogonal partitions are nice from a mathematical point of view, 
but may not be all that nice from a practical point of view. Quite often a well-chosen set of 
orthogonal contrasts will enable an experimenter to interpret his/her data wisely, clearly, 
and completely. However, the experimenter should really consider any and all contrasts 
that are meaningful and should not be overly concerned about whether the selected 
contrasts are orthogonal or not. 


8.2 Contrasts of Interaction Effects 

This section begins with the definition of an interaction contrast in a two-way 
experiment. 
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Definition 8.3: A linear combination of the p ijr X X, 0), ; p u is called an interaction contrast 
if X, co,j = 0 for; = 1,2, ...,£> and X,ft>, ( = 0 for z = 1,2, ...,f. 

Contrasts in the main effects of a two-way experiment give rise to special types of inter¬ 
action contrasts. Suppose that X, =l c,/7;. is a contrast in the T main effect means and that 
X H is a contrast in the B main effect means. Then X, X. c,d (J u i; is an interaction contrast. 
That is, if one takes ft),, = c,d ( for all i and /, then one gets an interaction contrast. Two inter¬ 
action contrasts, X X ; 0) :; p u and X, X, ofyi,,. are called orthogonal interaction contrasts if 
X, X, (OijCo'j = 0. Orthogonal contrasts in the two sets of main effects give rise to orthogonal 
contrasts in the interaction effects. Suppose that X, =l c,/7,. and X, =1 c'p,. are two contrasts in 
the T main effect means and suppose that X ; -i and X w d'jT.j are two contrasts in the 
B main effect means, then X X, e,d ; p : . and X, X, c'd'tp are orthogonal interaction contrasts 
if either X,_, cf' = 0 or X )=1 djd' = 0; that is, if at least one of the pairs of main effect contrasts 
is an orthogonal pair of contrasts, the two interaction contrasts will be orthogonal to one 
another. Next suppose that S * and S* ; are as defined in Section 8.1, and let S* xB be defined by 


S = 

U T y R 


= \Q„ = 




c iAn y<i- 


If? I 




p = l,2,...,f-l; q = 1,2,...,b-1 


(8.3) 


Sj xB defines a partitioning of the sum of squares for interaction. That is, X,X,Q), = TxB 
interaction sum of squares, and all of the (t - l)(b - 1) single-degree-of-freedom sums of 
squares in Equation 8.3 have independent probability distributions. In the next section, an 
example is discussed that illustrates the ideas described in the preceding section. 


8.3 Paint-Paving Example 

Consider the experiment in Table 8.1, which gives the means of three independent replica¬ 
tions of each of the paint x paving treatment combinations. This experiment was con¬ 
ducted to compare the lifetimes, measured in weeks, of two colors of paint manufactured 
by two different companies on three types of paving surfaces. The error sum of squares for 
this experiment was 455.04 with 24 degrees of freedom so that a 2 = 18.96. The usual analy¬ 
sis of variance table for this experiment is given in Table 8.2. 


TABLE 8.1 


Paint-Paving Cell Means 


Paint 

Asphalt I 

Asphalt II 

Concrete 

Mean 

Yellow I 

15 

17 

32 

21.333 

Yellow II 

27 

30 

20 

25.667 

White I 

30 

28 

29 

29.000 

White II 

34 

35 

36 

35.000 

Mean 

26.5 

27.5 

29.25 

27.750 



190 


Analysis of Messy Data Volume 1: Designed Experiments 


TABLE 8.2 


Analysis of Variance Table for Paint-Paving Experiment 


Source of Variation 

df 

SS 

MS 

F 

P 

Total 

35 

2039.79 




Paint 

3 

896.75 

298.92 

15.75 

<0.001 

Paving 

2 

46.50 

23.25 

1.25 

n.s. 

Paint x paving 

6 

641.50 

106.42 

5.64 

<0.001 

Error 

24 

455.04 

18.96 




The possible values for each of the paint and paving treatments give rise to two sets of 
orthogonal contrasts in the main effect means that might be of interest. These are given in 
Table 8.3. These two sets of orthogonal contrasts in main effects suggest six orthogonal 
contrasts in the interaction effects. These are given in Table 8.4. 

A more complete analysis of these data using the partitioning suggested in Tables 8.3 
and 8.4 is given in Table 8.5. Next the details of the computations necessary to obtain the 
sum of squares for three selected contrasts from Table 8.5 are given. 


TABLE 8.3 


Main Effect Contrasts for Paint-Paving Experiment 


Comparison Contrast 

Hypothesis 

Paints 

Yellow I vs yellow II 

O 

II 

i=£ 

i 

i=£ 

White I vs white II 

o 

II 

i=i 

i 

i=£ 

Yellow vs white 


Pavings 

Asphalt I vs asphalt II 

P-l — P-2 = 0 

Asphalt vs concrete 

/ 7.1 + / 7 . 2 - 2 / 7.3 = 0 


TABLE 8.4 

Interaction Hypotheses for Paint-Paving Data 

Comparison Contrast 

Hypothesis 

Yellow x asphalt 

Mil — Ml2 — M21 + M22 = 0 

White x asphalt 

M31 — M32 — M41 + M42 = 0 

Color x asphalt 

Mu — M 12 + M 21 — M 22 — M 31 + M 32 — M 41 + M 42 = 0 

Yellow x type 

Mil ”*■ Ml2 — ^Ml3 — /4l — fh.2 2/^23 = 0 

White x type 

M 3 I M 32 — 2/43 — M 4 I — M42 ^M43 = 0 

Color x type 

M 11 + M 12 — ^Mi3 "*■ M 21 + M 22 — 2/43 — M 31 — M 32 
"*■ 2/43 — M41 — M 42 + 2M43 = 0 


Note: "Type" refers to asphalt vs concrete. 
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TABLE 8.5 


Analysis of Variance Table for Paint-Paving Experiment Including 
Single-Degree-of-Freedom Tests 


Source of Variation 

df 

SS 

MS 

F 

P 

Total 

35 

2039.79 




Paint 

3 

896.92 

298.97 

15.77 

<0.0001 

Yellow 

1 

84.5 

84.5 

4.46 

<0.05 

White 

1 

162.0 

162.0 

8.54 

<0.01 

Color 

1 

650.25 

650.25 

34.30 

<0.0001 

Paving 

2 

46.5 

23.25 

1.23 

n.s. 

Asphalt 

1 

6.0 

6.0 

0.32 

n.s. 

Type 

1 

40.5 

40.5 

2.14 

n.s. 

Paint x paving 

6 

641.5 

106.92 

5.64 

<0.001 

Yellow x asphalt 

1 

0.75 

0.75 

0.04 

n.s. 

White x asphalt 

1 

6.75 

6.75 

0.36 

n.s. 

Color x asphalt 

1 

13.5 

13.5 

0.71 

n.s. 

Yellow x type 

1 

600.25 

600.25 

31.66 

<0.0001 

White x type 

1 

2.25 

2.25 

0.12 

n.s. 

Color x type 

1 

18.0 

18.0 

0.95 

n.s. 

Error 

24 

455.04 

18.96 




The single degree of freedom sum of squares for comparing the two white paint means is, 
from Equation 8d, 


Qi 


3-3[(l)29.0 +(-1)35.0] 2 

( l ) 2 + (- 1) 2 


162.0 


The single degree of freedom sum of squares for comparing asphalt with concrete is, from 
Equation 8.2 


2 _ 3-4[(l)(26.5) + (1)(27.5) + (-2)(29.5)] 2 _ 
l 2 +1 2 + (-2) 2 

and the single degree of freedom sum of squares for comparing the white x type inter¬ 
action is, from Equation 8.3, 

2 _ 3[(1)(30) + (1)(28) + (-2)(29) + (-1)(34) + (-1)(35) + (2)(36)] 2 _ „ 

3 ( l ) 2 + ( l ) 2 + (- 2) 2 + (- 1) 2 + (- 1) 2 + ( 2) 2 

From examining the analysis in Table 8.5, one can make the following conclusions: 

1) All of the interaction in the experiment is caused by the two yellow paints acting 
differently on the two types of surfaces, since this interaction contrast is the only 
single degree of freedom sum of squares for interaction that is significant. 
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2) Because we now know where the interaction exists in the data, we can make the 

following observations: 

a) Since there is no interaction between asphalt and paint, the two asphalts can be 
compared after averaging across all paints. The value of the F-statistic for this 
comparison is F = 0.32; thus, there is no significant difference between asphalts 
I and II. 

b) Since there is no interaction between the white paints and the three pavings, 
the two white paints can be compared after averaging across all pavings. The 
value of the F-statistic for this comparison is F = 8.54, which indicates that white 
paint II is significantly different from white paint I. From Table 8.1, we see that 
white paint II lasts longer. 

c) Although the statistic for comparing yellow I vs yellow II is significant (F = 4.45), 
one must be careful when making an interpretation because of the significant 
interaction between the brands of yellow paint and the type of paving. 

d) Even though the comparison for asphalt vs concrete is not significant (F = 2.14), 
one must again be careful when making an interpretation because of the 
significant interaction between the brands of paint and the types of paving. 

3) To complete the analysis of these data, we should yet examine: 

a) Yellow I vs yellow II on asphalt (that is, test p n + p ]2 - ,u 2] ~ lh .2 = 0). 

b) Yellow I vs yellow II on concrete. 

c) Concrete vs asphalt for yellow I. 

d) Concrete vs asphalt for yellow II. 

e) The three pavings for white paints. 

The results are given in Table 8.6. 

Examination of the results in Table 8.6 and the means in Table 8.1 reveals that 1) yellow 
II is significantly better than yellow I on asphalt, but 2) yellow I is significantly better than 
yellow II on concrete; 3) yellow I lasts significantly longer on concrete than on asphalt; 

4) yellow II lasts significantly longer on asphalt than on concrete; and finally, 5) the white 
paints last about the same length of time on all three pavings. 

All of the results obtained for our analysis of this example can be obtained using many 
statistical computing packages such as SAS® and SPSS using their contrast options. Some 
of these programs may require that the 12 treatment combinations be considered as a 
one-way treatment structure. 


TABLE 8.6 

Tests of Hypotheses in Conclusion 3 


Source of Variation 

df 

SS 

MS 

F 

P 

a 

i 

468.75 

468.75 

24.72 

<0.0001 

b 

i 

216.00 

216.00 

11.39 

<0.005 

c 

i 

512.00 

512.00 

27.00 

<0.0001 

d 

i 

144.50 

144.50 

7.62 

<0.02 

e 

2 

3.00 

1.50 

0.08 

n.s. 
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8.4 Analyzing Quantitative Treatment Factors 

In this section it is assumed that the levels of both factors of an experiment are quantita¬ 
tive. In this case, one can define contrasts that measure curvilinear trends in each set of 
main effect treatment means. Trends of interest are often linear, quadratic, cubic, and so 
on. The corresponding orthogonal contrasts that partition the main effect sums of squares 
into effects that measure linear, quadratic, cubic, and so on, trends can then be used to 
construct orthogonal contrasts in the interaction effects. The resulting contrasts are called 
Lin T x Lin B (linear effect of T by linear effect of B), Lin T x Quad B, and so on. For a 3 x 4 
experiment where both treatments have equally spaced levels (for example, 5, 10, 15 for 
factor T and 2, 4, 6, 8 for factor B), the Lin T contrast is defined by -p h + p 3 .= 0 and the 
Lin B contrast is identified by —3/7., - p. 2 + fi. , + 3,0. 4 = 0. 

Note that for the Lin T contrast, c, = -1, c 2 = 0, and c 3 = 1 in X I=I c,q,, = 0 and in the Lin B 
contrast, d t = -3, d 2 = -1, d 3 = 1, and d i = 3 in X. , d fi. j = 0. Thus the Lin T x Lin B interaction 
contrast is defined by 3p n +p l2 -p l3 -3p u -3p 31 - p 32 +p 33 +3p 3i = 0. The values of the c, and 
the d j used to define these kinds of orthogonal contrasts in the main effect means 
can be found in a table of orthogonal polynomial coefficients. See Beyer (1968). These are 
reproduced in Table 8.7 for a 3 x 4 experiment. 

Let x v x 2 ,...,x t represent the levels of factor T and let z„z 2 , ... ,z b represent the levels of 
factor B. There always exist parameters a kh , k = 0,1,2,..., t - 1, h = 0,1,2,..., b - 1, such that 
the cell mean parameters can be represented as a polynomial function of x, and z ; , That is, 
there exist a kh such that 


f-1 b-l 


t ,, = XIXa* 2 / 

k=0 h =0 


(8.4) 


Expanding Equation 8.4, one gets 

Ak/ = ®00 +®20 %i ^ ®01 + ®02 ^ ^ ®0k-l 

+ a n x t Zj + « 12 x ; z ; 2 + • • • + a t _ lb _ 1 xf 1 z b j ~ 1 

Table 8.8 gives the expected values of main effect and interaction contrasts for a 3 x 4 
experiment in terms of the a kh in Equation 8.4. In constructing the table, it is assumed that 
the three levels of the x were coded to -1, 0, and 1 and that the four levels of the z were 


TABLE 8.7 


Orthogonal Contrast Coefficients for a 3 X 4 Experiment 


Contrast 


Coefficients 



T 

c. 

c 2 

c 3 


UnT 

-1 

0 

1 


Quad T 

-1 

2 

-1 


B 

dr 

£^2 

T, 


Lin B 

-3 

-1 

1 

3 

Quad B 

1 

-1 

-1 

1 

Cubic B 

-3 

1 

-1 

3 
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TABLE 8.8 


Expected Values of Orthogonal Polynomials in a 3 X 4 Experiment 


Comparison Contrast 

Hypothesis 

LinT 

2 c^o + lOa^ = 0 

Quad T 

2a, 0 + 10O22 = 0 

Lin B 

600^! + 40ob! + 492ai,3 + 3280^ = 0 

Quad B 

48ob 2 + 320^ = 0 

Cubic B 

48ob 3 + 320% = 0 

Lin T x Lin B 

40a n + 328a 13 = 0 

Lin T x Quad B 

32a 12 = 0 

Lin T x Cubic B 

96o! 3 = 0 

Quad T x Lin B 

40ob! + 328ob 3 = 0 

Quad T x Quad B 

32022 = 0 

Quad T x Cubic B 

96023 = 0 


coded to -3, -1,1, and 3. The purpose of providing Table 8.8 is to point out the hypotheses 
that are actually being tested when contrasts of this type are being investigated. For exam¬ 
ple, the Lin T x Lin B contrast tests the hypothesis that 40 a,, + 328a 13 = 0. Thus, if this effect 
is significant, it could be because a n ^ 0 or a l3 0 and not only because a n ^ 0 (as many 
data analysts might believe). 

If one is going to examine orthogonal polynomials, it is recommended that one looks at the 
coefficients of the highest degree term first and consider the remaining terms in descending 
order of degree. Once a higher-order term is determined to be in the model, then all terms 
whose two components both have degrees lower than that of the significant term should also 
be included in the model whether significant or not. For example, if one decides that x 2 z 2 
should be in the model, then the model should also include xz 2 , z 2 , x 2 z, x 2 , xz, x, and z should 
also be included in the model. The reason is that orthogonal polynomials always refer to 
coded values of the quantitative variables. Thus, if a 22 is nonzero, it really implies that 


' x - h t ' 

2 

f z - h 2 N 

\ c i j 


V c 2 j 


belongs in the model where 


( x - /Zj A 


and 


( z - h 2 A 


are the coded values of x and z. Expansion of 


' x - h 1 ^ 

2 

f z - h 2 ' 

v C 1 J 


V c 2 j 


demonstrates that the terms xz 2 , z 2 , x 2 z, x 2 , xz, x, and z are also in the model, even though 
other lower degree orthogonal polynomials may not have been significant. 
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8.5 Multiple Comparisons 

Any of the multiple comparison procedures discussed in Chapter 3 can be used with only 
some very minor adjustments for making multiple comparisons on the main effects of a 
two-factor experiment. The adjustments require that the n and the n, be replaced by the 
total number of observations that were averaged to estimate the main effect means being 
compared. In this chapter, the sample sizes are nt for the B main effect means and nb for 
the T main effect means. Our recommendations for multiple comparisons on main effect 
means are the same as those given in Section 3.2. The only procedures given in Chapter 3 
that are easily generalized to contrasts in the interaction effects are the LSD procedure, 
Bonferroni's method, the multivariate t method, the simulation method, and Scheffe's 
procedure. We have found that Scheffe's procedure is not very satisfactory because the 
required critical point is much too large and the procedure is much too conservative. Our 
recommendations for multiple comparisons of interaction contrasts are as follows: 

1) Conduct an F-test for interaction. 

2) If the F-statistic is significant, make any planned comparisons by using the LSD 
procedure (or equivalently, the contrast procedure given in the preceding section). 
For data snooping and unplanned comparisons, use the procedure given by 
Johnson (1976), which is not discussed here. 

3) If the F-test for interaction is not significant, the experimenter should still examine 
any individual interaction contrasts that she had planned to consider but by using 
the multivariate t method or Bonferroni's method. The multivariate t method is 
used whenever the selected contrasts are linearly independent; otherwise, 
Bonferroni's method should be used. 


8.6 Concluding Remarks 

In this chapter, we introduced, by giving examples, methods for obtaining a maximum 
amount of information from an experiment. Included were methods for discovering where 
interaction occurs in an experiment. Knowing where interaction occurs in an experiment 
is valuable in determining the best answers to questions that may be raised. The tech¬ 
niques introduced in this chapter should help experimenters do a better job of analyzing 
their experiments. The analysis of quantitative treatment factors was also considered, 
including how to determine what kinds of trends might be related to the different levels of 
the treatment factors. 


8.7 Exercises 


8.1 An experiment was conducted in an RCB design structure (with days as blocks) 
to aid in developing a product that can be used as a substrate for making ribbons. 
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The treatment structure is a two-way with one factor consisting of three dif¬ 
ferent base (B) polymers (mylar, nylon, and polyethylene). The second factor 
consisted of five different additives (A) that could be included to enhance the 
formulation. The additives are denoted by: cl, c2, c3, c4, and c5. The variable 
of primary interest is the tensile strength of the resulting ribbon. The data are 
given below: 


B 

A 

Day 

TS 

B 

A 

Day 

TS 

B 

A 

Day 

TS 

Mylar 

cl 

1 

9.2 

Nylon 

cl 

1 

8.2 

Peth 

cl 

1 

9.2 

Mylar 

c2 

1 

8.7 

Nylon 

c2 

1 

7.7 

Peth 

c2 

1 

13.4 

Mylar 

c3 

1 

9.1 

Nylon 

c3 

1 

11.4 

Peth 

c3 

1 

9.7 

Mylar 

c4 

1 

12.4 

Nylon 

c4 

1 

8.1 

Peth 

c4 

1 

9.1 

Mylar 

c5 

1 

10.5 

Nylon 

c5 

1 

9.5 

Peth 

c5 

1 

8.5 

Mylar 

cl 

2 

8.2 

Nylon 

cl 

2 

7.2 

Peth 

cl 

2 

8.4 

Mylar 

c2 

2 

8.7 

Nylon 

c2 

2 

7.7 

Peth 

c2 

2 

12.5 

Mylar 

c3 

2 

8.8 

Nylon 

c3 

2 

10.5 

Peth 

c3 

2 

9.1 

Mylar 

c4 

2 

11.5 

Nylon 

c4 

2 

7.8 

Peth 

c4 

2 

9.1 

Mylar 

c5 

2 

10.6 

Nylon 

c5 

2 

9.6 

Peth 

c5 

2 

8.9 

Mylar 

cl 

3 

8.4 

Nylon 

cl 

3 

7.4 

Peth 

cl 

3 

8.2 

Mylar 

c2 

3 

8.3 

Nylon 

c2 

3 

7.8 

Peth 

c2 

3 

8.5 

Mylar 

c3 

3 

8.7 

Nylon 

c3 

3 

7.3 

Peth 

c3 

3 

8.8 

Mylar 

c4 

3 

8.5 

Nylon 

c4 

3 

7.7 

Peth 

c4 

3 

8.1 

Mylar 

c5 

3 

8.8 

Nylon 

c5 

3 

7.1 

Peth 

c5 

3 

8.3 


Obtain an analysis of the data and write a report that summarizes what you 
believe to be all of the information in the data. 

8.2 An experiment was conducted to study the relationships between three factors 
as to their effect on the ability of a tire tread made with the combinations to 
increase friction. A RCB design structure was used. Three levels of carbon (5, 7, 
and 9) were combined with combinations of two types of rubber gum, type A at 
four levels (0.1,0.3,0.5, and 0.7) and type B at two levels (0.2 and 0.8). Use orthog¬ 
onal polynomials to investigate the effects of the three factors on the friction 
index. Obtain an analysis of the data and write a report that summarizes what 
you believe to be all of the information in the data. The data follow. 


Block 

Type A 

Type B 

C 

Friction 

Block 

Type A 

Type B 

C 

Friction 

1 

0.3 

0.2 

5 

14 

1 

0.3 

0.2 

7 

18 

1 

0.3 

0.2 

9 

20 

1 

0.3 

0.8 

5 

15 

1 

0.3 

0.8 

7 

21 

1 

0.3 

0.8 

9 

17 

1 

0.5 

0.2 

5 

15 

1 

0.5 

0.2 

7 

22 

1 

0.5 

0.2 

9 

21 

1 

0.5 

0.8 

5 

17 

1 

0.5 

0.8 

7 

23 

1 

0.5 

0.8 

9 

15 

1 

0.7 

0.2 

5 

17 

1 

0.7 

0.2 

7 

25 

1 

0.7 

0.2 

9 

27 

1 

0.7 

0.8 

5 

19 

1 

0.7 

0.8 

7 

27 

1 

0.7 

0.8 

9 

18 


Continued 
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Block 

Type A 

Type B 

C 

Friction 

Block 

Type A 

Type B 

C 

Friction 

1 

0.9 

0.2 

5 

25 

1 

0.9 

0.2 

7 

35 

1 

0.9 

0.2 

9 

35 

1 

0.9 

0.8 

5 

25 

1 

0.9 

0.8 

7 

35 

1 

0.9 

0.8 

9 

25 

2 

0.3 

0.2 

5 

17 

2 

0.3 

0.2 

7 

23 

2 

0.3 

0.2 

9 

24 

2 

0.3 

0.8 

5 

15 

2 

0.3 

0.8 

7 

21 

2 

0.3 

0.8 

9 

17 

2 

0.5 

0.2 

5 

19 

2 

0.5 

0.2 

7 

27 

2 

0.5 

0.2 

9 

25 

2 

0.5 

0.8 

5 

22 

2 

0.5 

0.8 

7 

28 

2 

0.5 

0.8 

9 

18 

2 

0.7 

0.2 

5 

22 

2 

0.7 

0.2 

7 

29 

2 

0.7 

0.2 

9 

30 

2 

0.7 

0.8 

5 

24 

2 

0.7 

0.8 

7 

32 

2 

0.7 

0.8 

9 

23 

2 

0.9 

0.2 

5 

29 

2 

0.9 

0.2 

7 

37 

2 

0.9 

0.2 

9 

38 

2 

0.9 

0.8 

5 

30 

2 

0.9 

0.8 

7 

38 

2 

0.9 

0.8 

9 

32 


8.3 Consider Exercise 7.1 in Chapter 7. 

1) Use orthogonal contrasts of the row treatments to show that the sum of 
squares for row treatments can be expressed as the sum of two single degree 
of freedom sums of squares. 

2) Use orthogonal contrasts of the column treatments to show that the sum of 
squares for column treatments can be expressed as the sum of three single 
degree of freedom sums of squares. 

3) Use the orthogonal contrasts in parts 1 and 2 to construct orthogonal inter¬ 
action contrasts to show that the sum of squares due to interaction can be 
expressed as the sum of the six single degree of freedom sums of squares for 
your interaction contrasts. 





Using the Means Model to Analyze Balanced 
Two-Way Treatment Structures with Unequal 
Subclass Numbers 


Chapters 7 and 8 considered the equal sample size case, where each treatment combination 
is observed an equal number of times. Chapters 13-15 consider cases where some treat¬ 
ment combinations are missing, but this chapter, as well as Chapters 10-12, assumes that 
every treatment combination is observed and at least one combination is observed more 
than once. 


9.1 Model Definitions and Assumptions 

As in Section 7.1.1, let q, ( be the expected response when possibility i of treatment T and 
possibility j of treatment B are both applied to the same experimental unit. This chapter 
assumes that the observed response, Y ijk , can be modeled by 

Y ijk = ii l j + e ip z' = l,2,..., f, ;' = 1,2, k = l,2, ...,n v (9.1) 

where under the ideal conditions, 

e ijk ~ i.i.d. N(0, o 2 ), z = l,2, ...,f, /' = 1,2,..., b, fc = l,2, 
and /!,, > 0 for every i and /. 


9.2 Parameter Estimation 

Everything discussed for the one-way treatment structure in Chapters 1-3 applies to the 
two-way treatment structure as well, if one considers the bt treatment combinations as bt 
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different treatments. For unbalanced data problems that is often the best and simplest way 
to analyze the data. The best estimates of the parameters in the means model are 


and 


1 

fii l =—'L l J,jk= Vi).' i = l,2,...,t, j = l,2,...,b 


1 

N-bt 


X( Visk-Vij.? 

ijk 


where N=n-. The sampling distributions of the /r, ( and a 2 are 

Aj o 2 /nf) for i= 1,2,...,t, j = l,2,...,b 


(9.2) 


(9.3) 


and 


(N - bf)6 2 /o 2 ~ x 2 (N - bt ) 

In addition, all of the /i, ; and a 2 are independently distributed as before. 

The experimenter will usually want to answer the same questions when the data are 
unbalanced as when they are balanced. Recall that those questions were: 

1) Do the two sets of treatments interact? 

2) How do the T treatments affect the response? 

3) How do the B treatments affect the response? 

These questions can be stated as hypotheses in terms of the parameters of the means 
model, and the corresponding hypotheses are: 

JW If,, ~ Ifi-j ~ Vir + Ifi-r = 0 for a11 ***' and i*i' 

H T -f i.=f 2 = -=ff 
Hp.p. i = ju.2 = -” = ju. s 

Testing the above hypotheses should be considered as a first step in analyzing any 
two-way experiment. There will usually be well-defined contrasts that directly address 
other questions of interest to the researcher. The hypotheses H TxB , H T/ and H B are often 
tested to help choose an appropriate multiple comparison procedure for addressing other 
questions that may be of interest. 

As in Equation 7.2, there always exist parameters p, r„ /?,, and i = 1,2,..., f, j = 1,2,..., b 
such that Pjj can be expressed in an effects model as 

q, ; = q + t, + fi, + Y,,, z = 1, 2,..., f, j — 1,2,... ,b 

Many experimenters prefer to look at a representation of the treatment combination means 
using the effects model. This may be because many statisticians have encouraged experi¬ 
menters to consider such models. As a result, much of the existing computer software 
leads data analysts towards using effects model representations. Both types of models are 
considered in this book. The means model is considered in this chapter and the effects 
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model is considered in Chapter 10. Sections 1.5 and 1.6 introduced different procedures for 
calculating test statistics. For the one-way case, all those methods gave rise to the same test 
statistics. This is, in fact, always the case for well-balanced two-way experiments as well. 
However, it is not always the case with unbalanced data sets. Well-balanced means that 
there is an equal number of observations for each treatment combination. The matrix pro¬ 
cedure is used in this chapter to obtain test statistics, and model fitting procedures are 
used in Chapter 10. 


9.3 Testing Whether All Means Are Equal 

Consider the data in Table 9.1. The data are from a small two-way treatment structure exper¬ 
iment conducted in a completely randomized design structure. To begin, the two-way cell 
means and the marginal means for the data in Table 9.1 are calculated. Table 9.2 gives these 
means where a rozv marginal mean is defined as the mean of the cell means in a given row, 
and a column marginal mean is defined as the mean of the cell means in a given column. 

One thing that should be noticed is that in unbalanced two-way experiments there are 
two different ways that one can compute T means and B means. Table 9.1 gives means that 
are computed by taking row (column) totals and dividing by the number of observations 
in the row (column). Table 9.2 gives means that are computed in two steps. First cell means 
are computed for each TxB combination, and then means of the cell means are computed 
for each row and column. The two methods generally give different answers unless the 


TABLE 9.1 


An Unbalanced Two-Way Experiment 



Bi 

b 2 

B, 

T Totals 

T Means 

T 

19 

24 

22 




20 

26 

25 

182 

22.750 


21 

— 

25 



Cell totals 

60 

50 

72 



t 2 

25 

21 

31 




27 

24 

32 

217 

27.125 


— 

24 

33 



Cell totals 

52 

69 

96 



B totals 

112 

119 

168 

399 


B means 

22.4 

23.8 

28.0 


24.9375 


TABLE 9.2 

Cell Means and Marginal Means for the Data in Table 9.1 


b 2 

B, 

T Marginal Means 

Ti 20 

25 

24 

23 

T, 26 

23 

32 

27 

B marginal means 23 

24 

28 

25 
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experiment is well balanced. For example, the 7\ mean in Table 9.1 is 22.75 while the Tj 
mean in Table 9.2 is 23. Consequently, one question that must be addressed is which of 
these two methods should be used when calculating treatment main effect means. Which 
method should be used is discussed in Section 9.5. 

The experimental error variance for the data in Table 9.1 is 

a 2 = —-—Y(y, t -y, ) 2 = 20/10 = 2 with N-bt -16-6 = 10 degrees of freedom 
N-btff 

Next consider the experiment as a one-way treatment structure with six treatments and 
test that all six treatment combination means are equal to one another. That is, consider 
testing 


Hn — Hi 2 — Hl 3 — Hxi — H-22 — Hl 3 


Using Equation 1.8, one gets 


SSH 0 


60 2 50 2 72 2 52 2 69 2 96 2 399 2 

—- 1 - 1 - 1 - 1 - 1 - 

3 2 3 2 3 3 16 


238.9375 


which is based on 5 degrees of freedom. Thus the F-statistic for testing H 0 is 

F c = 238 - 9 2 375/5 = 23.89 

which is significant at the a = 0.00003 level. Consequently, H 0 would be rejected, and 
thus there are significant differences between the means of the six different treatment 
combinations. 


9.4 Interaction and Main Effect Hypotheses 

In the previous section, it was determined that there are significant differences among the 
six treatment combination means. Now it is necessary to see where differences occur. As a 
first step, consider whether there is significant Tx B interaction for the data given in 
Table 9.1 and test 


Htj ~ Hr, ~ Hiy + Hey = 0 for all i±i' and j ± j' 

This can be accomplished by utilizing the matrix procedure for developing test statistics 
that was discussed in Section 1.4. The hypothesis H ]xK will be true if and only if 

Hn ~ Hu — H 21 + H 22 = 0 


and 


H11 H13 H21 + H23 ~ 0 
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In turn, these statements are true if and only if Cp = 0 where 


1-1 0 -110 
1 0 - 1-101 


and p I y 11 p u . ii . u,, u . ■ [ 


Then from Equation 1.11, 


SSH^ b = [C£]'[CDCr[C£] = [-8 




since D = Diag[l/3,1/2,1/3,1/2,1/3,1/3]. Thus SSH ^ = 776(6/65) = 71.631 and it is based 
on 2 degrees of freedom. The corresponding F-statistic is F c = (71.631/2)/2 = 17.91 which is 
based on 2 and 10 degrees of freedom and is significant at the d = 0.0005 level. Other C 
matrices could be used to test the hypothesis of no interaction, but all such matrices will 
produce the same test statistic. The reader can verify that this is true. 

Next consider testing the equality of the expected row marginal means. This is being 
done for illustration purposes even though such a test may not be appropriate here because 
of the significant T x B interaction. The appropriate hypothesis is H,: p t . = p 2 . Note that H r 
is true if and only if Cp = 0 where C = [1 1 1 -1 -1 -1]. Using Equation 1.11 one gets 


SSH t = [CpY[CDCV[Cp] = [—1 2 ] [14/6] 1 [— 12 ] = 61.714 

which is based on 1 degree of freedom, and the corresponding F-statistic is 
F c = (61.714/l)/2 = 30.857, which is significant at the a = 0.00024 level. One could also take 
C = [| | 5 “I “I “jl- The reader should verify that this second choice for C leads to the 
same F-test statistic as does the C used above. 

Finally, consider testing the equality of the expected column marginal means by testing 
H b : p. } = p . 2 = p. 3 . Note that H B is true if and only if C = 0 where 


1-1 0 1-1 0 
1 0-11 0-1 


Using Equation 1.11, one gets 


SSH b = [CpYlCDCViCp] = [-2 




-2 

-10 


- [-2 



'9 -5' 

(N 

1 

1 _ 

-5 10 

L-io. 


77.169 


which is based on 2 degrees of freedom. Hence, the corresponding F-statistic is F c = 19.29 
which is significant at the a= 0.0037 level. 
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TABLE 9.3 

Analysis of Variance Table for Data in Table 9.1 


Source of Variation 

df 

ss 

MS 

F 

P 

Total 

15 

258.938 




1 

II 

II 

II 

=£ 

5 

238.938 

47.79 

23.89 

0.00003 

T 

1 

61.714 

61.71 

30.86 

0.00024 

B 

2 

77.169 

38.58 

19.29 

0.00037 

TxB 

2 

71.631 

35.81 

17.91 

0.0005 

Error 

10 

20.00 

2.00 




The above tests are summarized in the analysis of variance table given in Table 9.3. Note 
that there are some differences between the results in this analysis and those obtained for 
the balanced case, discussed in Chapter 7. 

1) For balanced data, it is always true that 

SS T + SS B + SSjyg = SS^ i=Mi2= ... =/ll23 

This is generally not true for unbalanced data. 

2) For the balanced case, SS T , SS B , and SS T/B are statistically independent; this is 
generally not true for unbalanced data. 

One does not need to be overly concerned about the fact that the sums of squares for T, B, 
and T x B do not add up to equal the sum of squares for testing that all means are equal or 
that the sums of squares for T r B, and TxB are not statistically independent, and these 
points are only being made here because they are true and not because they are issues 
with which one must deal. 

There are other sums of squares that are often associated with analyses of two-way 
treatment structures. Two popular sets of these are examined in Chapter 10. 


9.5 Population Marginal Means 

Often the experimenter is interested in making comparisons about and between the pos¬ 
sibilities of each main effect. In the balanced case, one may compare p 1 . r p z .,...,p,. with 
one another. As mentioned earlier, these means are called the population marginal means 
for both the balanced and unbalanced case. The best estimate of p r is 


„ l b _ 

Vi-= pLhj= k' i = l,2,...,f 
The estimated standard error of p r is 


s-e-(k) 



i = 1,2,..., t 


(9.4) 


(9.5) 
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The best estimate of ,u. is 


„ i > _ 

P.j=T ££«=£■>/ j = l,2,...,b 
t 1=1 


The estimated standard error of /.p is 

S£-0= j = X2,...,l 


(9.6) 


(9.7) 


It should be noted that, in unbalanced data problems, it is generally the case that p„ 
will be different from y ; . and that /7. ; will be different from y... In the example, y h . = 22.75 
and p h = 23. 

The estimators p,. and ft , are unbiased estimates of p„ and p. t , respectively. It can be 
noted that the estimators y and y ■. are unbiased estimates of 



and 


( > 


b, = 




Vi=l 


respectively. That is, y,-.. provides an unbiased estimator of a weighted average of the cell 
mean parameters in the z'th row using the sample sizes within the row cells as weights. 
Likewise, y_ u provides an unbiased estimator of a weighted average of the cell mean para¬ 
meters in the /1h column using the sample sizes within the column cells as weights. When 
using computing packages to analyze data, it is extremely important to determine whether 
estimates of the main effect means are calculated as p,. and p_ t or as t/,.. and y ... In most 
designed experiments, one will want to use the estimators pi. and p. r 
For the data in Table 9.1, the estimated population marginal means are shown in Table 9.2 
as pi. = 23, and p>. = 27, and p. t = 23, and p. 2 = 27, and p. 3 = 28. The estimated standard errors 
of these estimates are 
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and 


s.e.(p .3) 


V2 


1+1 

3 + 3 


0.58 


To make inferences about linear combinations of the population marginal means, say 
X, Cjp u or X, djfl'j, it can be shown that 


and that 


tjsSr 

V ' \ i v J 


t(v) 


'L d Ar'L d iN 


a 

t 


f , \ 


t(v) 


ZfIZj 

1 V ' ‘U 


(9.8) 


(9.9) 


The formulas in Equations 9.8 and 9.9 can be obtained as special cases of Equation 1.4. For 
example, the test statistic that tests H T : p h = jl 2 . using a f-test is 


t, = 


Mi- - 


23-27 




i n v 


i n v 


V2 

3 


1 1 1 

-1-1- 

3 2 3 


1 1 1 

-1-1- 

2 3 3 


-4 _ -4 

1.414 [7 ~ 0.72 
3 V3 


-5.55 


which is significant at the a = 0.00024 level. A 95% confidence interval for p ., - p. 2 is 


Ml M"2 + h 


= 23 - 24 + t 0 


1.414 


- — 2 

= -1 + (2.228)(0.91)= -1 + 2.03 


1 1 

-1- 

3 2 


+! 


1 1 

-1- 

2 3 


9.6 Simultaneous Inferences and Multiple Comparisons 

There are few good procedures available for making multiple comparisons in two-way 
experiments in which there are unequal numbers of observations per treatment combina¬ 
tion. If one wants to compare all pairs of two-way cell means, then any of the techniques 
discussed in Chapter 3 can be used simply by considering the two-way experiment as a 
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one-way treatment structure experiment. In this case the reader should see the recommen¬ 
dations given in Section 3.2. If one wishes to make multiple comparisons on the population 
marginal means, it is recommended that one use f-tests based on Equations 9.8 and 9.9. 
Use the given significance levels if the corresponding F-test for comparing the correspond¬ 
ing marginal means is significant. If the F-test is not significant, then it is still recom¬ 
mended that one use these f-tests. However, in this case, one should use Bonferroni's 
method and claim two population marginal means to be significantly different only when 
the calculated significance level is less than a/p where a is the selected experimentwise 
error rate and p is the number of comparisons that are being considered prior to collecting 
the data. If it is determined that there is interaction in the data, then one would likely want 
to compare the effects of one of the treatments for each possibility of the other treatment. 
That is, one would likely want to compare the cell means within each row to one another 
and the compare the cell means within each column to one another. There are bt(t + l)/2 + 
bt(b + l)/2 such pairwise comparisons. If the F-test comparing all means is significant, then 
one can use the actual significance levels for all such pairwise comparisons. This use is the 
equivalent of a Fisher's LSD procedure. If the F-test comparing all means is not significant, 
then one would take a Bonferroni approach. For data snooping and unplanned compari¬ 
sons, one should use Scheffe's procedure. The test statistics described in Section 9.4 can be 
obtained automatically with many statistical computing packages. Since these packages 
employ the effects model, the interested reader should see Section 10.7. 


9.7 Concluding Remarks 

This chapter is the first of seven considering the analysis of two-way treatment structures 
with unequal subclass numbers. The analyses presented in this chapter were obtained by 
using the means model. An important assumption made was that all treatment combina¬ 
tions were observed at least once. Procedures for testing main effect and interaction hypoth¬ 
eses were obtained as special cases of the general techniques introduced in Chapter 1. 
Population marginal means were defined, and procedures for making inferences on 
the population marginal means were given. In Chapter 10, similar kinds of questions are 
answered by utilizing the effects model; however, the authors hope to make all readers 
equally comfortable with both the means model and the effects model. 


9.8 Exercises 

9.1 Consider an experiment which was conducted to study the effectiveness of four 
types of drugs on three different diseases. The experiment was conducted in a 
completely randomized design with six patients assigned to each drug x disease 
combination initially. Data on the variable to be analyzed (change in disease 
score after two weeks) is missing on 13 of the 72 patients. The data are given 
below. 

1) Analyze this data with SAS-GLM using a means model. Include the follow¬ 
ing options on the MODEL statement: E and SOLUTION. 
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2) Use LSMEANS to perform a multiple comparison procedure that has an 
a = 0.05 experimentwise error rate when comparing all 12 treatment combi¬ 
nations with one another. 

3) Using CONTRAST and/or ESTIMATE options determine the observed 
significance levels for the following hypotheses: 

a) (p ] .+p 2 )/2 = p 3 .=p 4 . 

b) Ah i = hn = hi3 

c) (hi. + h 2 .)/2 = h 3 - 

d) hi 2 = h?2 = h 32 = h 42 

e) p 3 .-Pi. = 0 

f) hi 2 — h 13 — h32 + h33 = 0 

g) h 22 — h 32 = 0 

4) Construct 95% confidence intervals for the contrasts in e-g above. 

5) If there is significant interaction in these data, see if you can determine which 
treatment combinations are responsible for the interaction. 



Disease 1 

Disease 2 

Disease 3 

Drug 1 

42, 44, 36,13,19, 22 

33,26,33,21 

31, -3, 25, 24 

Drug 2 

28, 23, 34,42,13 

34,33,31,36 

3,26,28,32,4,16 

Drug 3 

1, 24, 9, 22, -2,15 

21,1,9,3 

11,9,7,1, -6 

Drug 4 

1, 29,19 

22, 7, 25, 5,12 

27,12, -5,16,15,12 




Using the Effects Model to Analyze Balanced 
Two-Way Treatment Structures with Unequal 
Subclass Numbers 


Chapter 9 contained a discussion of the analysis of two-way treatment structures having 
unequal subclass numbers using the means model. This chapter considers using an effects 
model in the same situation. All questions that can be answered by using the effects model 
can also be answered by using the means model, and vice versa. The effects model is being 
discussed because it is often an important tool when using statistical computing packages 
to analyze two-way treatment structures as statistical software is often programmed to 
automatically produce test statistics for main effects and interaction effects as well as 
estimates of marginal means, two-way means and their estimated standard errors. 


10.1 Model Definition 

The effects model corresponding to the means model (9.1) is defined by 

y ijk = n + t,■+ ■+ y X] ■+ £ ijk , z = 1,2,...,f; /' = 1,2,... ,b; k = l,2,...,n ij (10.1) 

where £ ijk ~ i.i.d. N( 0, cr 2 ). 


10.2 Parameter Estimates and Type I Analysis 

Other sums of squares are often associated with analyses of two-way treatment structures 
having unequal subclass numbers besides those introduced in Chapter 9. Two of these are 
examined in this chapter. The first set of sums of squares involves fitting the two-way 
effects model to the observed data in a sequential manner using generalizations of 
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the model comparison method described in Section 1.6. One sequence of steps often used 
is the following: 

Step 1. Fit y ijk = p + e i;k and denote its residual sum of squares by RSS V 

Step 2. Fit y jjk = p + T, + e ijk and denote its residual sum of squares by RSS 2 . 

Step 3. Fit y jjk = p + x,+ [f + £ ijk and denote its residual sum of squares by RSS 3 . 

Step 4. Fit y ijk = p + x, + If + y ij + e ijk and denote its residual sum of squares by RSS 4 . 

The quantity RSS, is the residual sum of squares after fitting the model in the /th step, 
i = 1,2,3,4. The difference between RSS l and RSS 2 , denoted by R(x\p), is called the reduc¬ 
tion due to r adjusted for p; that is, R(x\p) = RSS, - RSS 2 . This reduction gives the amount 
by which one can reduce the residual sum of squares of the model in Step 1 by consider¬ 
ing a model with t, included as well. The larger the value of R(x\p), the more important 
it is to have r, in the model. Thus, R(r \p) is a measure of the effect of the different pos¬ 
sibilities for treatment T. The quantity R(p 1 p, x) = RSS 2 - RSS 3 is called the reduction due 
to p adjusted for both p and x. It gives the additional amount by which one can reduce 
the residual sum of squares of the model in step 2 by also including P l in 
the model. R(p\p, x) is a measure of the effect of different possibilities for treatment B 
above and beyond the effect of treatment T. Finally, the quantity R(y | p, x, p) =RSS 3 - RSS 4 
is called the reduction due to y adjusted for p, x, and p. It gives the additional amount 
by which one can reduce the residual sum of squares of the model in step 3 by adding 
interaction parameters, the y 1( , to the model. Clearly, R(y\p, x, p) is a measure of interac¬ 
tion, since the model in step 3 is an additive model that holds if and only if there is 
no interaction. 

An analysis of variance table corresponding to this sequential analysis is given in 
Table 10.1. This analysis is called a type I analysis. The sums of squares in the last four 
lines of Table 10.1 are statistically independent, and the ratios of the T, B, and T x B mean 
squares to the error mean square all have noncentral F-distributions. It is quite interest¬ 
ing and informative to determine exactly the hypothesis that each of the F-statistics in 
Table 10.1 is testing. The hypotheses that are being tested are given in Section 10.4. 

To illustrate, each of the four models required for the type I analysis are fit to the data in 
Table 9.1. An understanding of Chapter 6 is required to follow the computations made 
here. However, such an understanding is not necessary for those readers willing to let 
statistical computing packages perform the required computations. Those readers who are 
interested in the details may consider the next few pages. The model given in step 1 is 


TABLE 10.1 


Analysis of Variance Table for a Sequential Analysis (Type I Analysis) 


Source of 
Variation 

df 

SS 

MS 

F 

Total 

N-l 

RSS 1 



T 

t- 1 


R(r\p)/(t-l) 

TMS 

6 2 

B 

b -1 

R(P\p x) 

R(P\p,r)/(b-l) 

BMS 

(T 2 

TxB 

(f -1)(6 - 1) 

R(r\u,x,P) 

R(y\n,r,p)/[(b-l)(t-l)] 

(T*B)MS 

<7 2 

Error 

N-tb 

rss 4 

<j 2 




Using the Effects Model to Analyze Balanced Two-Way Treatment Structures 


211 


y ijk = /i + Ejj k . The best estimate of ,u in this model is ju = y... = 24.9375, the average of all of the 
observations, and the residual sum of squares is 

ESS, = ^ (y ijk - y) 2 = Xy# - n-y- = 10209 - 16(24.9375) 2 = 258.9375 

i,j,k i,j,k 

and this residual sum of squares is based on n.. - 1 = 16 - 1 = 15 degrees of freedom. 

The normal equations for the model defined in step 2 are: 


'16 8 8" 


"A" 


'399' 

8 8 0 



= 

182 

8 0 8 


_ f 2_ 


217 


One possible solution to these equations is obtained by using the set-to-zero restric¬ 
tions discussed in Section 6.2, which yields the solution f 2 = 0, T, = -4.375, and 
/l = 27.125 (recall from Chapter 6 that a unique solution does not exist). The residual sum 
of squares is 

rss 2 = ]^(y ijk~ \) = ^jVijk—ft ■ y...~ Ti ■ 3 / 1 .. — ^2 ■ 3 / 2 .. 

i,j,k i,j,k 

= 10209 - 27.125 • 399 - (-4.375) -182-0-217 
= 182.375 

which is based on 16-2 = 14 degrees of freedom. Thus, R(t\h) = 258.9375-182.375 = 
76.5625 and is based on 15 - 14 = 1 degree of freedom. 

The normal equations for the model in step 3 are: 


'16 

8 

8 

5 

5 

6' 




'399' 

8 

8 

0 

3 

2 

3 


c 


182 

8 

0 

8 

2 

3 

3 




217 

5 

3 

2 

5 

0 

0 


A 


112 

5 

2 

3 

0 

5 

0 


A 


119 

6 

3 

3 

0 

0 

6 


_A_ 


168 


To obtain a solution, one can let f 2 = 0 and if = 0 (see Chapter 6). Then this system of equa¬ 
tions can be reduced to an equivalent system by deleting the rows and columns that 
correspond to t 2 and If. The system then reduces to: 


'16 8 5 5' 




'399' 

8 8 3 2 


A 


182 

5 3 5 0 


A 


112 

5 2 0 5 


A_ 


119 
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The solution to this reduced system is ,u = 30.154, f , = -4.308, If = -5.169, ff = -4.631. The 
residual sum of squares for this model is 

RSS 3 = ^ 3/yfc — ft ■ y... ~ A y.\. — A ■ y 

i,j,k 

= 10209 - 30.154-399 - (-4.308)-182 - (-5.169) 112 - (-4.631) 119 
= 91.631 

and this residual sum of squares is based on 16 - 4 = 12 degrees of freedom. Thus 
R(p\p,i) = RSS 2 - RSS 3 = 182.375 - 91.631 = 90.744 which is based on 14 - 12 = 2 degrees of 
freedom. 

The normal equations for the model in step 4 are 

R 

A 
A 
A 
A 

fn 

7 12 

y 13 

721 

722 

723 

To obtain a solution, let f 2 = 0, p 3 = 0, y 13 = 0, y 21 = 0, y 21 = 0, y 23 = 0 (see Chapter 6). Using the 
reduction technique, the system reduces to 

16 8 5 5 3 2l\y] I" 399 

8 8 3 2 3 2 A 182 

5 3 5 0 3 0 A _ H2 

5 2 0 5 0 2 p “ 119 

3 3 3 0 3 0 ^ 60 

2 2 0 2 0 2]^ [50 

The solution to this reduced system is p = 32, f , = -8, If = -6, A = -9, y rl = 2, y 12 = 10. Thus 
ct 2 = 2, since the residual sum of squares of the full model is RSS i = 10,209 - 10,189 = 20 and 
is based on 10 degrees of freedom. Also, R(y\p, T,/?) = RSS 3 -RSS 4 = 91.631-20 = 71.631 and 
it is based on 12 - 10 degrees of freedom. All of the preceding results can be summarized 
in an analysis of variance table like the one given in Table 10.2. 
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TABLE 10.2 


Type I Analysis of Variance Table 


Source of 
Variation 

df 

ss 

MS 

F 

a 

Total 

15 

258.9375 




T 

1 

76.5625 

76.5625 

38.28 

0.0001 

B 

2 

90.744 

45.372 

22.69 

0.0002 

TxB 

2 

71.631 

35.815 

17.91 

0.0005 

Error 

10 

20.000 

2.0 




The sum of squares as well as the test statistic for interaction in the type I analysis is the 
same as that obtained using the means model and the matrix procedure in Chapter 9; 
however, the two procedures give different sums of squares and test statistics for both of 
the T and B main effects. The data in a two-way treatment structure could also be analyzed 
by fitting /i first, then [i, then t, and finally y. The only new sums of squares required that 
are not already given in Table 10.2 are R(p \ ,u) and R( t\ju, ji). For unbalanced data cases, the 
corresponding F-tests for the T and B main effects will usually be different from those 
obtained by the fitting the effects in the order given in Table 10.1 and/or Table 10.2. 

It should be recalled from Chapter 6 that the parameter estimates, ,u = 32, f , = -8, [f = -6, 
[i 2 = -9, y n = 2, y 12 = 10, are not unbiased estimates of the corresponding parameters, jJ, T„ 
If, [i,, y n , and y 12 . Indeed, these individual parameters are not estimable. Under the set-to- 
zero restrictions used to solve the normal equations, it is possible to show that 

p is an unbiased estimate of jJ + T-. + P 3 + y 23 
f , is an unbiased estimate of t 3 - t 2 + y 13 - y 23 
Pj is an unbiased estimate of If - P 3 + y 21 - y 23 

P 2 is an unbiased estimate of p 2 - If + y 22 - y 23 (10.2) 

y u is an unbiased estimate of y u - y 13 - y 21 + y 23 

and 

y 12 is an unbiased estimate of y n - y 13 - y 22 + y 23 . 

More about estimable functions and their estimates can be found in the next section. 


10.3 Using Estimable Functions in SAS 

The SAS®-GLM procedure has an option that can be used to identify estimable functions of 
the model parameters. (Readers who do not use SAS may skip this section.) Since it is easiest 
to describe estimable functions by using an example, consider again the data in Table 9.1. 
A SAS-GLM analysis of this data can be obtained by using the following statements: 


PROC GLM; 

CLASSES T B; 

model y = T b t*b /<s elected options>; 
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TABLE 10.3 

General Form of Estimable Functions 


Effect 


Coefficients 

Intercept 


LI 

T 

1 

L2 

T 

2 

L1-L2 

B 

1 

L4 

B 

2 

L5 

B 

3 

LI - L4 - L5 

TxB 

1 1 

L7 

TxB 

1 2 

L8 

TxB 

1 3 

L2-L7-L8 

TxB 

2 1 

L4-L7 

TxB 

2 2 

L5-L8 

TxB 

2 3 

L1-L2-L4-L5 + L7 + L8 


Many options can be used with the MODEL statement. One of the most important of 
these is the E option. This option asks SAS-GLM to print a general form of the estimable 
functions of the model parameters. Recall from Chapter 6 that all linear functions of the 
parameters in a design model are not necessarily estimable. Using the E option, SAS-GLM 
prints information that one can use to determine those linear combinations of the param¬ 
eters that are estimable and those that are not estimable. 

The general form of an estimable function given by SAS-GLM is shown in Table 10.3. It 
means that a linear function of the model parameters f'f where l' = [/,, t 2 ,, f 12 ] and 
/]' = [p, t y t 2 , If, If, If, y n , y 12 , y 13 , y 2V y 22 , y 23 ] is estimable if and only if there exist constants 
LI, L2, L4, L5, L7, and L8 such that 

l'p= (Ll)p + (L2) t, + (LI - L2)t, + (L4 )p 1 + (15) B 2 + (LI - L4 - L5)/T + (L7)y u 
+ (L8)y u + (L2 - L7 - L8)y 13 + (L4 - L7)y 21 + (L5 - L8)y 22 
+ (LI - L2 - L4 - L5 + L7 + L8)y 23 

For example, from the general form of estimable functions, we can see: 

1) That p is not estimable, since in order for p to be estimable, one would at least need 
to have LI = 1, L2 = 0, and LI - L2 = 0 all at the same time, and these three equations 
cannot all be true at the same time. 

2) That r, is not estimable, since in order for r, to be estimable, one would at least need 
to have LI = 0,L2 = 1, and LI - L2 = 0 all at the same time which cannot be true. 

3) That - t 2 is not estimable, since in order for T, - r 2 to be estimable, one would at 
least need to have LI = 0, L2 = 1, LI - L2 = -1, L4 = 0, L5 = 0, LI - L4 - L5 = 0, L7 = 0, 

L8 = 0, and L2-L7 - L8 = 0 all at the same time. However, L2 = 1, L7 = 0, L8 = 0, and 
L2 - L7 - L8 = 0 cannot all be true at the same time. 

It is clear that there are many functions of the model parameters that are not estimable. Are 
there any interesting functions of the model parameter that are estimable? From Chapter 6, 
one knows that there are estimable functions of the model parameters. In fact, a basis set of 
estimable functions of the model parameters is defined to be a set of linearly independent 
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TABLE 10.4 

A Basis Set of Estimable Functions 


LI 

L2 

L4 

L5 

LI 

L8 

Estimable Function 

1 

0 

0 

0 

0 

0 

M + ^2+ $3 + 723 

0 

1 

0 

0 

0 

0 

^1-^2+713-723 

0 

0 

1 

0 

0 

0 

Pi — @3 + 7n — 723 

0 

0 

0 

1 

0 

0 

02 $3 + 7*2 723 

0 

0 

0 

0 

1 

0 

7ll - 7l3 - 721 + 723 

0 

0 

0 

0 

0 

1 

7l2 — 7l3 — 722 + 7l3 


functions that are estimable and so that every other estimable function can be written as a 
linear combination of the estimable functions in this basis set. In the general form of the esti¬ 
mable functions, one can see that one is free to choose six of the L, namely, LI, L2, L4, L5, L7, 
and L8. Thus there are six linearly independent estimable functions in a basis set. One basis set 
that can be easily obtained is to successively let one of the six L be equal to 1 and let all of the 
remaining L be equal to 0. For example, taking LI = 1 and L2 = LA = L5 = L7 = L8 = 0, the gen¬ 
eral form simplifies to y + t 2 + j8 3 + y 23 , giving one linear function of the model parameters that 
is estimable. Taking L2 = 1 and the other L equal to 0, the general form simplifies to 
T, - r 2 + y 13 - y 23 , giving a second linear function of the model parameters that is estimable. 
Continuing in this manner produces the basis set of estimable functions given in Table 10.4. 

Note that the number of linear functions in the basis set is equal to six, which is equal to 
the rank of the X'X matrix as discussed in Chapter 6 and also equal to the number of treat¬ 
ment combinations in this 2x3 experiment. Also note that the estimable functions in Table 
10.4 are exactly the same as the functions that are being estimated by the solution to the 
normal equations that satisfies the set-to-zero restrictions as shown in Equation 10.2. This 
is not a coincidence; it is always true when using SAS-GLM to analyze data that one can 
find the functions being estimated by the set-to-zero solution to the normal equations by 
simply letting each of the L in the general form that one is free to choose be equal to one 
and letting the other L equal to zero. 

Basis sets of estimable functions are not unique, and another basis set of estimable 
functions is given in Table 10.5, along with the values of LI, L2, L4, L5, L7, and L8 that 
produced this basis set. 

When one uses the SOLUTION option on the SAS-GLM model statement, the computer 
prints out least squares estimates of the model parameters using the set-to-zero restrictions. 


TABLE 10.5 

Another Basis Set of Estimable Functions 


Ll 

12 

L4 

L5 

L7 

L8 

Estimable Function 

l 

1/2 

1/3 

1/3 

1/6 

1/6 

H+T.+ f}. + f.. 

0 

1 

0 

0 

1/3 

1/3 

T 1~ T 2 + ?!■ - ?2- 

0 

0 

1 

0 

1/2 

0 

A-03 +7-1-7-3 

0 

0 

0 

1 

0 

1/2 

02 — 03 + 7-2 — 7-3 

0 

0 

0 

0 

1 

0 

7ll — 7l3 — 721 + 723 

0 

0 

0 

0 

0 

1 

7l2 — 7l3 — 722 + 7l3 
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TABLE 10.6 

Results Obtained with SOLUTION Option in SAS-GLM 


Parameter 



Estimate 

Standard Error 

t- Value 

Pr > | f | 

Intercept 



32.00000000 B 

0.81649658 

39.19 

<0.0001 

T 

1 


-8.00000000 B 

1.15470054 

-6.93 

<0.0001 

T 

2 


0.00000000 B 

— 

— 

— 

B 

1 


-6.00000000 B 

1.29099445 

-4.65 

0.0009 

B 

2 


-9.00000000 B 

1.15470054 

-7.79 

<0.0001 

B 

3 


0.00000000 B 

— 

— 

— 

TxB 

1 

1 

2.00000000 B 

1.73205081 

1.15 

0.2751 

TxB 

1 

2 

10.00000000 B 

1.73205081 

5.77 

0.0002 

TxB 

1 

3 

0.00000000 B 

— 

— 

— 

TxB 

2 

1 

0.00000000 B 

— 

— 

— 

TxB 

2 

2 

0.00000000 B 

— 

— 

— 

TxB 

2 

3 

0.00000000 B 

— 

— 

— 


Note: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the nor¬ 
mal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. 


The results of the SOLUTION option are shown in Table 10.6. Note that the results of the 
solution option agree with the set-to-zero solution given in Section 10.2 which is repeated 
here for convenience: 


fi = 32, f j = 8, ft = -6, ft = -9, y u = 2, f 12 = 10 

However, as pointed out before, these least squares estimates do not estimate their 
respective parameters. In fact, as shown previously, the individual parameters are not esti¬ 
mable. SAS-GLM indicates this by putting the letter B next to each of the least squares 
estimates in Table 10.6. The functions of the parameters that these estimators are really 
unbiased estimates of are those given in Table 10.4. That is, // = 32 is the best unbiased esti¬ 
mate of p + t 2 + If + y 23 , and f, = -8 is the best unbiased estimate of r, - r 2 +y 13 - y 23 , f 2 = 0 
is estimating zero (which it does a good job of doing, too), ft = -6 is the best unbiased esti¬ 
mate of ft - ft + Yn - 723/ and so on. The standard errors printed in Table 10.6 are the actual 
estimated standard errors of the estimators. That is, sx.{fi) = 0.8165, sx.( fj) = 1.1547, and so 
on. The t- test statistics in Table 10.6 test that the corresponding function being estimated is 
equal to zero. For example t = 39.19, which corresponds to p, tests H 0 : p + t 2 + ft + y 23 = 0. 
Such tests are usually not very interesting. By using CONTRAST or ESTIMATE statements 
in the SAS-GLM procedure, one can make inferences about any estimable linear combina¬ 
tion of the parameters that one chooses to consider. The linear combination need not be a 
contrast, but it must be estimable. Fortunately, SAS-GLM always checks whether or not the 
specified linear combination is an estimable function. If it is, the ESTIMATE statement 
gives its best unbiased estimate, its estimated standard error, and a f-statistic and p-value 
that tests whether the parameter function being estimated is equal to zero or not. The 
CONTRAST statement allows the simultaneous testing of several estimable functions, and 
gives a F-statistic along with its p-value, as described in Chapter 1. 

The proper use of the ESTIMATE statement for our example requires the following form: 

ESTIMATE 'label ' INTERCEPT c 2 T c 2 C 3 B c 4 C 5 C 6 T*B c 7 C 8 C 9 C 10 c n C 12 ; 
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If all of the coefficients of a particular effect are zero, that effect and its coefficients do not 
need to be included in the ESTIMATE statement. To use the CONTRAST statement, 
one needs only to replace the word ESTIMATE with CONTRAST in the above form. For 
example, to obtain the best estimates of the estimable functions given in Table 10.5, one 
would use: 

1. ESTIMATE 'OVERALL MEAN' INTERCEPT 1 T -5 -5 B .33333 .33333 .33333 

T*B .16667 .16667 .16667 .16667 .16667 .16667; 

2. ESTIMATE 'TI -T2' T 1 -1 T*B .33333 .33333 .33333 -.33333 -.33333 
-.33333; 

3. ESTIMATE 'Bl -B3' B 1 0 -1 T*B .5 0 -.5 .5 0 -.5; 

4. ESTIMATE 'B2 -B3' B 0 1 -1 T*B 0 .5 -.5 0 .5 -.5; 

5. ESTIMATE 'INTI' T*B 10-1-101; 

6. ESTIMATE 'INT2' T*B 01-10-11; 


10.4 Types I-IV Hypotheses 

Many readers of this book may already be aware that the SAS-GLM procedure gives users 
the option of selecting one of four types of sums of squares for testing hypotheses. This 
section is mainly concerned with defining and interpreting the corresponding four types 
of hypotheses that are tested. The data in Table 9.1 will be used to illustrate these hypoth¬ 
eses. As stated in Section 10.2, the type I sums of squares are obtained by fitting the two- 
way effects model in a sequential fashion. The sum of squares obtained at each step, which 
is a measure of the importance of the particular term being added at that step, is the amount 
that the residual sum of squares can be reduced by including that term in the model. The 
type II analysis is also obtained by utilizing the model comparison technique. The sums of 
squares corresponding to each effect are adjusted for every other effect in the model that 
is at the same or a lower level. Hence, the type II sum of squares corresponding to the 
T-effect is R(r \y, fi) and the type II sum of squares corresponding to the B-effect 
is R(/5 \/u, t). Readers who are slightly confused should see Section 16.3, because the 
definitions of the type I and type II analyses for a three-way treatment structure will help 
clarify the differences between the type I sum of squares and the type II sum of squares. 
Table 10.7 shows the type I and type II sums of squares for two-way effects models and 
Table 10.8 gives the type II sums of squares and test statistics for the data in Table 9.1. 

We advise experimenters to think in terms of the parameters in the means model rather 
than the parameters in the effects model. The main advantage of using effects models to 
model two-way experiments is that many of the interesting hypotheses are tested auto¬ 
matically; this makes it easy for the experimenter to do. The disadvantage of the effects 


TABLE 10.7 


Definitions of Types I and II Sums of Squares 


Source of Variation 

df 

Type I SS 

Type II SS 

T 

f-1 

R(r|p) 

R(r\u,p) 

B 

b- 1 

R{P\U, r) 

R(/3 1 /r, t) 

TxB 

(f-!)(&-!) 

R(y| p, r, (8) 

R(y\tt, r,p) 
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TABLE 10.8 


Type II Analysis of Variance Table 


Source of Variation 

df 

ss 

MS 

F 

a 

Total 

15 

258.9375 




T 

1 

72.369 

72.369 

36.18 

0.0001 

B 

2 

90.744 

45.372 

22.69 

0.0002 

TxB 

2 

71.631 

35.815 

17.91 

0.0005 

Error 

10 

20.000 

2.0 




model is that it is difficult for experimenters to understand exactly what is being tested by 
the type I and type II sums of squares. The main advantage of the means model is that one 
is able to understand exactly what is being tested and/or estimated. The disadvantage of 
the means model is that the experimenter has to write her own ESTIMATE and CONTRAST 
statements to get the test statistics of interest. We want experimenters to be knowledgeable 
about both the effects model and the means model; the effects model can be used to auto¬ 
matically get many statistics of interest and the means model to identify what is being 
estimated or tested by one of statistics of interest. 

What hypotheses are being tested by the type I and type II sums of squares? The hypoth¬ 
eses being tested by the type I sums of squares are called type I hypotheses. Table 10.9 
gives the hypotheses tested by a type I analysis of the data in Table 9.1 for model (10.1) 
in terms of the parameters in a means model. Later we shall show how to determine 
these hypotheses from the SAS-GLM procedure. Clearly, an experimenter would rarely be 
interested in the hypotheses that correspond to T and B in Table 10.9, and as the sample 
sizes change in one or more of the cells in Table 9.1, the hypotheses would change. The 
hypothesis corresponding to T x B is equivalent to testing a no-interaction hypothesis. 
That is, the contrasts 


Mu Mi 3 M21 + M 23 an d M12 M13 M 22 + M 23 

span the interaction space for this particular example. Table 10.10 gives general formula¬ 
tions for the Type I hypotheses for a two-way experiment expressed in terms of the means 
model parameters. Note that from the general form we can see that the type I hypothesis 
for T compares the weighted averages of row cell means using the sample sizes within 
each cell as weights. For example, the type I hypothesis for T using the data in Table 9.1 can be 
written as 


. 3 M11 ~t ^M12 + 3 m 13 _ 2 m 2 i ~t 3^212 3 m 2 i 3 

°‘ 8 8 


TABLE 10.9 

Hypotheses for a Type I Analysis of the Means Model for Data in Table 9.1 
Source of Variation Type I Hypotheses 

T 3/J u + 2 fi 12 + 3 /i 13 — 2/^21 — 3/^2 — 3/^23 = 0 

B 37fi n - 2Hu ~ 35jU 13 +28^ + 2 fi 22 - 30/^3 = 0 and 

2/hi 28/ii2 — 30/i 13 — 2/^21 +37— 35/^3 = 0 

T x B fi 11 — fi 13 — /I21 + 1 ^ 2 3 = 0 and /i 12 — /i 13 — fi 2 2 /^23 = 0 
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TABLE 10.10 


General Forms of Type I Hypotheses for a Two-Way Effects Model 
Expressed in Terms of Means Model Parameters 


Source of Variation 



Type I Hypotheses 


T 

1 

b 

X n i;^i; = 

;=i 

1 b 

" = ■■■ = 

n 2- j= 1 

n t- j =i 

B 

t 

l| 

t sT| s 

1 

sT 

b 1 77.77.., 

j = 1, 2,...,b 




M 


TxB 

nr 


f- = 0 for all i * i' t 

■j*? 


For the data in Table 9.1, the hypotheses tested by the type II analysis in terms of the 
means model parameters are given in Table 10.11. 

For the general model, the type II hypothesis tested by the row corresponding to the 
T-effect is 


n >i - 


.2 1 


'■) ) 


= 


XX 

>. - 1 /'=i 

l *l 


n tj n v 


,u 


i’j' 


z = l, 2,...,f 


(10.3) 


The other two rows are the same as they were for the type I analysis. 

The type I and type II hypotheses in terms of the effects model parameters in Equation 
10.1 for the data in Table 9.1 are given in Tables 10.12 and 10.13. 


TABLE 10.11 


Hypotheses for a Type II Analysis in Terms of the Means 
Model for the Data in Table 9.1 


Source of Variation 

Type II Hypotheses 

T 

4/hi + 1ft 2 + 5/i 13 — 4r/ 2l — 4/( 2 2 — 5 /i 23 = 0 

B 

Same as for type I analysis 

TxB 

Same as for type I analysis 

TABLE 10.12 


Hypotheses for a Type I Analysis in Terms of the Effects Model for the Data in Table 9.1 

Source of Variation 

Type II Hypotheses 


T T, + (1 /8)(/i, /l) + 0 /8)(3/n + 2y l2 + 3y l2 2 y 21 3y 22 3 A,) — 0 

B /I, — /l + (1/ 65)(37y u — 2y u — 3 5y 13 + 28y 2l + 2y 22 — 30y 22 ) = 0 

/i 2 — /l + (1 / 65)(2y n + 28y 2 — 30y 13 — 2y 21 + 37y 22 — 35y 22 ) = 0 
TxB hi - 7 i 3 - / 2 i + 723 = 0 and y 12 -/ 13 -+ fe = 0 
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TABLE 10.13 

Hypotheses for a Type II Analysis in Terms of the Effects Model for the Data 
in Table 9.1 

Source of Variation Type II Hypotheses 

T Ti - t 2 + (l/13)(4y n + 4y 12 +5y 13 - 4y 21 - iy^ - 5^) = 0 

B p — [i, + (1/65)(37y n — 2y 12 — 3 5y n +2Sy 2l +2y 22 — 30y 22 ) = 0 

/1 2 — ft + (1 / 65)(2y n +28y 2 — 30y 13 — 2y 2l +37y, 2 — 35y 2! ) = 0 
TxB y n - y 13 - y 21 + y 23 = 0 and y u - y 13 - + y 23 = 0 


Examination of Tables 10.10-10.13 reveals that the main effect hypotheses tested by the 
type I and type II analyses may not be very interesting. In addition, the rejection or accep¬ 
tance of these hypotheses may not be easy to interpret. 

A third way to compute sums of squares for an effects model is as follows: 

1) For the t levels of treatment T, generate t - 1 dummy variables, and for the b levels 
of treatment B, generate b — 1 dummy variables. These dummy variables corres¬ 
ponding to the rand ft are created so that in the model with the dummy variables 
T, = — (Ti + t 2 + ■ ■ ■ + r M ) and p b = - (ft + p 2 + ■ ■ ■ + 

2) The interaction between T and B is represented by the products of their corre¬ 
sponding dummy variables. In particular, y f; = -(y, + y 2( + —v y ( ) for j - 1,2,..., b 
andy fc = -(y 1 + y 2 + ---+y„_ 1 )forz'=l, 2, ... ft. 

3) The model with all of the dummy variables for the treatment variables and their 
interactions is fitted, and the residual sum of squares is obtained. This is equiva¬ 
lent to the residual sum of squares from the full-effects model. 

4) Next, a model is fitted that contains all of the dummy variables except those cor¬ 
responding to the main effect or interaction being tested. The difference between 
the residual sum of squares of this reduced model and that of the model in 3 is the 
sum of squares corresponding to that effect. 

The resulting analysis is called a type III analysis. It is also known as Yates's weighted 
squares of means technique. When all treatment combinations are observed, the hypotheses 
tested by a type III analysis are the same as those tested for balanced data sets. These type III 
hypotheses in terms of mean model parameters are given in Table 10.14 and in terms of effect 
model parameters in Table 10.15. These hypotheses are usually the ones desired by experi¬ 
menters, and the type III sums of squares for the data in Table 9.1 are given in Table 10.16. 


TABLE 10.14 


Type III Hypotheses for the Effects Model in Terms of the 
Means Model 


Parameters 

Source of Variation 

Hypotheses 

T 

- =Pi. 

B 

Ai = A 2 = ■" =P-b 

TxB 

Ay - Ay- - A-y + Ay = 0 for all i * i', j * f 
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TABLE 10.15 


Type III Hypotheses for the Effects Model in Terms of the 
Effects Model Parameters 


Source of Variation 

Hypotheses 

T 


h + fi. = r 2 + y 2 . = ■ 

= % + Yr 

B 


A + F.i = A + r.2 = 

■■■ = Pb + Yb 

TxB 



0 for all i ^ i', j ^ j' 

TABLE 10.16 




Type III Analysis of Variance Table 


Source of 




Variation 


SS MS 

F a 

Total 

15 

258.9375 


T 

1 

61.714 61.714 

30.86 0.0002 

B 

2 

71.169 38.585 

19.29 0.0004 

TxB 

2 

71.631 35.815 

17.91 0.0005 

Error 

10 

20.000 2.0 



The matrix form of the model reparameterized by Yates's method for the data in 
Table 9.3 is 


"19" 


20 


21 


24 


26 


22 


25 


25 


25 


27 


21 


24 


24 


31 


32 


33 



1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


1 

1 

0 

1 

0 " 


£ m 

1 

1 

0 

1 

0 


£ 112 

1 

1 

0 

1 

0 


£ 113 

1 

0 

1 

0 

1 


£ 121 

1 

0 

1 

0 

1 


£ 122 

1 

-1 

-1 

-1 

-1 


" n " 


£ 131 

1 

-1 

-1 

-1 

-1 


Ti 


£ 132 

1 

-1 

-1 

-1 

-1 


A 

+ 

£ 133 

-1 

1 

0 

-1 

0 


A 


£ 211 

-1 

1 

0 

-1 

0 


7n 


£ 212 

-1 

0 

1 

0 

-1 


_7l2 J 


£ 22 i 

-1 

0 

1 

0 

-1 


£ 222 

-1 

0 

1 

0 

-1 


£ 223 

-1 

-1 

-1 

1 

1 


£ 231 

-1 

-1 

-1 

1 

1 


£ 232 

-1 

-1 

-1 

1 

1 


_ £ 233 . 


Note that the fifth column (corresponding to y n ) in the above design matrix is the product 
of the second and third column (columns corresponding to T, and /!,), and the sixth column 
(corresponding to y 12 ) is the product of the second and fourth columns (columns corre¬ 
sponding to T, and /T). 
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SAS-GLM introduced a fourth way of generating sums of squares corresponding to the 
main effects and their interactions. When all treatment combinations are observed, the 
hypotheses tested by this type IV analysis are the same as those tested by the type III 
analysis; however, when some treatment combinations are not observed, the type III and 
type IV analyses do not agree. We shall discuss the construction of type IV hypotheses in 
Chapter 14. 

To conclude this section, we make the following recommendations when analyzing data 
in a two-way treatment structure model with no missing treatment combinations: 

1) If the experimenter wants to compare the effects of the two treatments, she should 
look at hypotheses tested by a type III analysis. These hypotheses are equivalent 
to the hypotheses tested in the balanced or equal-subclass-numbers case. 

2) If the experimenter is interested in building a model with which to predict the 
effects of particular treatment combinations, then he or she could use type I and/ 
or type II analyses. 

3) In sample survey experiments, the number of observations per treatment combi¬ 
nation is often proportional to the frequency with which those combinations actu¬ 
ally occur in the population. In this case, the experimenter may be most interested 
in the hypotheses based on R(z\p, (i) and R(J3\p, t), since these sums of squares 
test hypotheses about the weighted averages of the row means and the column 
means with the weights proportional to the observed sample sizes. This may 
require two type I analyses if one were using SAS-GLM, one with T first in the 
model and another with B first. 


10.5 Using Types I-IV Estimable Functions in SAS-GLM 

In this section, we show how one can use the information provided by the SAS-GLM to 
determine the hypotheses being tested by the different types of sums of squares. (Readers 
who do not use SAS-GLM can skip this section.) In Section 10.3 we discussed the general 
form of estimable functions obtained by using the E option on the MODEL statement. If 
the El option is chosen, SAS-GLM will print the general form of the type I estimable func¬ 
tions for each effect. The results of this option for the data in Table 9.1 when using an 
effects model are shown in Table 10.17. 

From Table 10.17, we see that If 3, a linear combination of the parameter vector /T = [p, t v 
r 2 , If, P 2 > Py Tip T\v Try Tiv Try I 23 L is a type I estimable function for T if and only if there 
exists a constant L2 such that 

l'P= (L2)r, - (L2 )t 2 + (0.125 ■ L2)/J, - (0.125 ■ L2)p 2 
+ (0.375 ■ L2)y n + (0.25 ■ L2)y 12 + (0.375 • L2)y 13 
- (0.25 ■ L2)y 21 - (0.375 ■ L2)y 22 -(0.375 ■ L2)y 23 

A basis set for the type I estimable functions for T can be obtained by choosing a specific 
value for L2, say L2 = 1 or L2 = 8 . We are free to choose only one of the L in the general form 
of an estimable function which corresponds to the 1 degree of freedom associated with the 
type I sum of squares for T; all of the remaining L are determined by our choice for L2. For 
our example, LI = 0, L4 = 0.125 ■ L2, L5 = -0.125 ■ L2, L7 = 0.375 • L2, and L 8 = 0.25 ■ L2. 
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TABLE 10.17 

Type I Estimable Functions from SAS-GLM for the Data in Table 9.1 


Coefficients 

Effect 



T 

B 

TxB 

Intercept 



0 

0 

0 

T 

1 


L2 

0 

0 

T 

2 


-L2 

0 

0 

B 

1 


0.125 x 12 

L4 

0 

B 

2 


-0.125 x 12 

L5 

0 

B 

3 


0 

-L4 - L5 

0 

TxB 

1 

1 

0.375 x L2 

0.5692 x L4 + 0.0308 x L5 

L7 

TxB 

1 

2 

0.25 x L2 

-0.0308 x L4 + 0.4308 x 15 

L8 

TxB 

1 

3 

0.375 x 12 

-0.5385 x L4 - 0.4615 x 15 

-L7 - L8 

TxB 

2 

1 

-0.25 x L2 

0.4308 x L4 - 0.0308 x L5 

-17 

TxB 

2 

2 

-0.375 x 12 

0.0308 x L4 + 0.5692 x L5 

-L8 

TxB 

2 

3 

-0.375 x 12 

-0.4615 x L4 - 0.5385 x L5 

L7 + L8 


Taking L2 = 1, a basis set for the type I estimable functions is 

{Ti - t 2 + (l/8)(ft — ft) + (l/8)(3y n + 2y 12 + 3y 13 — 2y 21 — 3y 22 — 3y 23 )} 

This function of the parameters is the one that is compared to zero by a type I analysis. See 
Table 10.11. Another basis set can be constructed by taking L2 = 8. This set is given by 

18h — 8 t 2 + Pi — ft "t 3y r i + 2y 12 + 3y 13 — 2y 21 — 3y 22 — 3y 23 ) 

Since = jJ + r, + [f + y t] the hypotheses being tested in terms of the means model param¬ 
eters can be determined by assigning the coefficients on the y,, in the effects model repre¬ 
sentation to the ^ in the means model representation. Thus, the type I hypothesis for T in 
terms of the means model parameters is 

(3y u + 2jj v , + 3y/ 13 - 2/u 21 - 3jd 22 - 3q 23 )/8 = 0 or equivalently that 
3y n + 2y n + 3y 13 — 2y n ~ 3y 22 — 3y 23 = 0 

which is the hypothesis for T given in Table 10.9. 

If the E2 option on the MODEL statement is chosen, SAS-GLM will print the general 
form of the type II estimable functions for each effect. The results for the data in Table 9.1 
are shown in Table 10.18. 

From Table 10.18, we see that 3 is a type II estimable function for B if and only if there 
exist constants L4 and L5 such that 

l'P= (L4)/3| + (L5)ft + (-L4 - L5) • ft + (0.5692 • L4 + 0.0308 ■ L5)y n 
+ (-0.0308 • L4 + 0.4308 ■ L5)y 12 + (-0.5385 • L4 - 0.4615 • L5)y l3 
+ (0.4308 ■ L4- 0.0308 ■ L5)y 21 + (0.0308 • L4 + 0.5692 ■ L5)y 2 , 

+ (-0.4615 ■ L4 - 0.5385 ■ L5)y 23 


In this case we can choose values for two of the L, namely L4 and L5. Thus, there are 2 
degrees of freedom corresponding to the type II sum of squares for B. With a little luck 
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TABLE 10.18 

Type II Estimable Functions from SAS for Data in Table 9.1 


Coefficients 


Effect 



T 

B 

TxB 

Intercept 



0 

0 

0 

T 

1 


L2 

0 

0 

T 

2 


-L2 

0 

0 

B 

1 


0 

L4 

0 

B 

2 


0 

L5 

0 

B 

3 


0 

-L4 - L5 

0 

TxB 

1 

1 

0.3077 x L2 

0.5692 x L4 + 0.0308 x L5 

L7 

TxB 

1 

2 

0.3077 x L2 

-0.0308 x L4 + 0.4308 x L5 

L8 

TxB 

1 

3 

0.3846 x L2 

-0.5385 x L4 - 0.4615 x L5 

-L7-L8 

TxB 

2 

1 

-0.3077 x L2 

0.4308 x L4 - 0.0308 x L5 

-L7 

TxB 

2 

2 

-0.3077 x L2 

0.0308 x L4 + 0.5692 x L5 

-L8 

TxB 

2 

3 

-0.3846 x L2 

-0.4615 x L4 - 0.5385 x L5 

L7 + L8 


or by using the general forms given in Table 10.9, one can determine that the decimal 
numbers given in the above expression have a lowest common denominator of 65. 
Choosing L4 = 1 and L5 = 0 and then L4 = 0 and L5 = 1 provides one basis set for the type II 
estimable functions for B. This set is 

\P\ ~ P 3 + (l/65)(37y n — 2y 12 — 35y 13 + 28y 21 + 2y 22 — 30y 23 ) and 
Pi ~ P 3 + (1/65)(2y u + 28y 12 — 30y 13 — 2y 21 + 37y 22 — 35y 23 )} 

in terms of the parameters in the effects model and 

{37^1% - 2p l2 - 35^3 +28p 21 + 2p 22 - 30p 23 and 
2p n + 28p l2 — 30p 13 — 2p n + 37p 22 — 35p 23 ) 

in terms of the parameters in the means model by letting L4 = 65 and L5 = 0 for the first 
function and by letting L4 = 0 and L5 = 65 for the second function. 

From the general form of the type I and/or type II estimable functions for T x B in Table 
10.17 and/or Table 10.18, we see that we have two L for which we can choose values. A basis 
set for the type I estimable functions for T x B can be obtained by letting L7 = 1 and L8 = 0 
and then L7 = 0 and L8 = 1. These two choices yield 

(7ii - Ta ~ 72i + 7>3 and y 12 - y 13 - y 22 + y 23 } 

in terms of the parameters in the effects model and by 

!hri — M 13 ~ M :21 M 23 and p l2 — p l3 — y 22 + p 23 \ 

in terms of the parameters in the means model. The reader might wish to verify that every 
2 x2 table differences p tj - = 0 for all; ^ i', can be obtained as some linear 

combination of these two functions in this basis set. 
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From Table 10.18, the general form of type II estimable functions for T is given by 

(L2)t, - (L2)t 2 + (0.3077 ■ L2)y n + (0.3077 • L2)y n + (0.3846 ■ L2)y 13 
+ (-0.3077 • L2)y 21 + (-0.3077 ■ L2)y 22 + (-0.3846 ■ L2)y 23 

The lowest common denominator of these decimal fractions is 13; thus, by taking L2 = 1, 
a basis set of type II estimable functions for T is 

ih - t 2 + (l/13)(4y u + 4y 12 + 5y 13 - 4y 21 - 4y 22 - 5y 23 )| 

in terms of the parameters in the effects model and by 

{4^u + 4y 12 + 5y 13 — 4y 21 — 4y 22 — 5y 23 ) 

in terms of the parameters in the means model (taking L2 = 13). 

Finally, we consider the type III and type IV estimable functions. For the data in Table 9.1, 
these are the same; they are shown in Table 10.19. A basis set of type III estimable functions 
for T in the parameters of the effects model is 


{Ti - t 2 + 7 i . - y 2 .) 


in terms of the parameters of the effects model and 

{P-i~P. 2 } 

in terms of the parameters of the means model (taking L2 = 1 in both cases). In a similar 
manner, we see that basis sets for the type III estimable functions for B are 

iPi ~ Pi + 7-1 - 7-3 and [i 2 - p 3 + y. 2 - y. 3 ) 


TABLE 10.19 

Type III Estimable Functions for the Data in Table 9.1 


Coefficients 


Effect 



T 

B 

TxB 

Intercept 



0 

0 

0 

T 

1 


L2 

0 

0 

T 

2 


-L2 

0 

0 

B 

1 


0 

L4 

0 

B 

2 


0 

L5 

0 

B 

3 


0 

-L4 - L5 

0 

TxB 

1 

1 

0.3333 x L2 

0.5 x L4 

L7 

TxB 

1 

2 

0.3333 x L2 

0.5 x L5 

L8 

TxB 

1 

3 

0.3333 x L2 

-0.5 x L4 - 0.5 x L5 

-L7-L8 

TxB 

2 

1 

-0.3333 x L2 

0.5 x L4 

-L7 

TxB 

2 

2 

-0.3333 x L2 

0.5 x L5 

-L8 

TxB 

2 

3 

-0.3333 x L2 

-0.5 x L4 - 0.5 x L5 

L7 + L8 
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in terms of the parameters in the effects model and 

{jU.i-jU .3 and p. 2 -p. 3 } 

in terms of the parameters in the means model. Note that H 0 : p. 3 -p. 3 = 0 and p. 2 - p. 3 = 0 
is true if and only if H a : p h = p 2 . = p 3 . is true. The reader can verify that the type IV esti¬ 
mable functions are identical to the type III estimable functions for the data in Table 9.1. 


10.6 Population Marginal Means and Least Squares Means 

The population marginal means for the two-way effects model are defined by 

Pi. = p+ p + fi. + Ji., i = 1,2, , t 

for the T-treatments and 

p.j = p+f. + Pj + f.j, j = l,2,...,b 

for the B-treatments. The best estimates of these marginal means are 

Pi. = P + T, + P. + fi., i = 1, 2 ,...,t and = p + #. + /3, + f.j, j = 1, 2,...,b 

respectively, and these estimates of the marginal means are often called least squares 
means. Their respective standard errors are given by Equations 9.5 and 9.7. To make infer¬ 
ences about linear combinations of the population marginal means, one can use (9.5) and 
(9.6). If one uses the 


LSMEANS T B T*B/PDIFF; 

option in the SAS-GLM procedure, one gets the best unbiased estimates of the marginal 
means and the two-way cell means as well as pairwise comparisons among certain subsets 
of these means. These are given in Table 10.20. Note that the least squares means for T and 
B are the same as the T and B means given in Table 9.2, and the two-way means are the 
same as the cell means in Table 9.2. Table 10.20 also provides the estimated standard errors 
of the least squares means and p-values for pairwise comparisons between the least squares 
means. For example, the estimated standard error of p h is 0.51 and this agrees with the 
estimated standard error that was computed in Section 9.5. Also the p-value that compares 
p 3 . with p 2 . is given in Table 10.20 as 0.0002 and this agrees with the p-value of the f-statistic 
calculated in Section 9.5. Similar comparisons can be made concerning other estimates, 
standard errors, and test statistics. 


10.7 Computer Analyses 

Nearly all the statistical computing packages have been developed in order to deal with 
effects models rather than means models. Since three major types of hypotheses can be 
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TABLE 10.20 


Best Estimates of the Marginal Means and Cell Means 


Least Squares Means 

T YLSMean 

H 0 : LSMean = 

Standard Error Pr > 1 1 \ 

0 H 0 : LSMeanl = LSMean2 
Pr > \t\ 

1 

23.0000000 

0.5091751 

<0.0001 

0.0002 


2 

27.0000000 

0.5091751 

<0.0001 



B 

YLSMean 

Standard Error Pr > 

M 

LSMean Number 

1 

23.0000000 

0.6454972 

<0.0001 

1 


2 

24.0000000 

0.6454972 

<0.0001 

2 


3 

28.0000000 

0.5773503 

<0.0001 

3 


Least Squares Means for Effect B 





Pr > \t\for H 0 : LSMean(i) = LSMean(j) 





Hi 

1 

2 

3 




i 


0.2990 

0.0002 




2 

0.2990 


0.0010 




3 

0.0002 

0.0010 





T 

B 

YLSMean 

Standard Error 

Pr > 1 1 1 LSMean Number 

1 

1 

20.0000000 

0.8164966 


<0.0001 

1 

1 

2 

25.0000000 

1.0000000 


<0.0001 

2 

1 

3 

24.0000000 

0.8164966 


<0.0001 

3 

2 

1 

26.0000000 

1.0000000 


<0.0001 

4 

2 

2 

23.0000000 

0.8164966 


<0.0001 

5 

2 

3 

32.0000000 

0.8164966 


<0.0001 

6 

Least Squares Means for Effect T xB 





Pr > |f| for H 0 : LSMean(i) = LSMeanfj) 





Hi 

1 

2 

3 

4 

5 

6 

1 


0.0031 

0.0061 

0.0009 0.0266 

<0.0001 

2 

0.0031 


0.4565 

0.4956 0.1524 

0.0003 

3 

0.0061 

0.4565 


0.1524 0.4068 

<0.0001 

4 

0.0009 

0.4956 

0.1524 


0.0425 

0.0009 

5 

0.0266 

0.1524 

0.4068 

0.0425 

<0.0001 

6 

<0.0001 

0.0003 

<0.0001 

0.0009 <0.0001 



tested for the balanced two-way treatment structure with unequal subclass numbers, one 
must be careful to determine which type is tested by the statistical package being used. 
Although the computing packages have been developed with effects models in mind, the 
means model can easily be implemented as well. Using the means model allows the user 
to specify meaningful contrasts among the treatment means. The names used in this 
discussion, that is, types I, II, III, and IV, correspond to the names used by SAS. SPSS uses 
the same notation and will produce type I, II, and III analyses. The default in SPSS is the 
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type III analysis. SAS-GLM produces both the type I analysis and the type III analysis by 
default. Readers who use other computing packages are encouraged to analyze the data in 
Table 9.1 and then they can compare their analyses with those given in this chapter and 
Chapter 9. Such a comparison should reveal the type of analysis their computing package 
is producing. SAS-GLM also allows means to be calculated using a 

MEANS T B T*B/<options>; 

statement. However, with unbalanced data the Means statement does not give unbiased 
estimates of the population marginal means, instead it computes estimates of weighted 
averages of the row means and of the column means. That is, the above Means option gives 
the best unbiased estimates of the weighted means defined by 

I i' i t 

h«. = X * = l 2 ,....,:t and q. ( = — £ n^, j = 1, 2 ,...,b 

n i. j =i n .j i=i 

Probably the only case in which an experimenter may be interested in these weighted 
means is in sample survey type experiments (see the third recommendation at the end of 
Section 10.4). 

For the data in Table 9.1, the best estimates of these weighted means are given by y h . = 22.75 
and y 2 .. = 27.125 for the T-weighted means and by y. h = 22.4, y. 2 . = 23.8, and t/. 3 . = 28.0 for the 
B-weighted means. 


10.8 Concluding Remarks 

In this chapter, the analysis of two-way treatment structures with unequal subclass 
numbers is considered using the effects model under the assumption that all treatment 
combinations were observed. The effects model was used, even though the means model 
provides answers to all questions that can be raised, because much of the existing statisti¬ 
cal computing software utilizes the effects model. Types I—III analyses and the conditions 
for using each analysis type were discussed. In almost all cases, the type III analysis will 
be preferred analysis. The type III analysis is the same as the analysis given in Chapter 9. 
Population marginal means were contrasted with weighted marginal means, and the 
appropriateness of each kind of mean was considered. 

It may be of some interest to note that, if one were to fit a two-way additive model, 

y ijk = p+ T, + pj+£ ijk , i = 1, 2,... ,f; j = 1,2, k = l,2,... ,n {j 

to the data in Table 9.1, the types II and III sums of squares for T and B would be identical 
to one another and they would both be equal to the type II sum of squares for T and B in 
the analysis of the two-way model with the interaction term included. 
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10.9 Exercises 

10.1 Consider the experiment described in Exercise 9.1 

1) Analyze the data with SAS-GLM using an effects model. Include the 
following options on the MODEL statement: E, El, E3, and SOLUTION. Also 
have GLM calculate the estimates of the drug main effect means, the disease 
main effect means, and the two-way means and have GLM do pairwise 
comparisons between these three sets of means. 

2) Write out the hypotheses being tested by the type I analysis using the notation 
of the means model as was done in Table 10.8, where represents the 
expected response to drug i and disease j. Express these in terms of the 
effects model. 

3) Write out the hypotheses being tested by the type III analysis in terms of the 
means model. 

4) Write out the hypotheses being tested by the type II analysis in terms of the 
means model. 

5) Restate the following hypotheses in terms of the effects model and use 
CONTRAST and/or ESTIMATE statements with the effects model to deter¬ 
mine the observed significance levels for the following hypotheses: 

a) (fl l .+p 2 )/2=fl 3 .=f^. 

b) Mu = hi2 = hl3‘ 

c) (fl 1 .+fl 2 .)/2 = fl 3 .. 

d) hi 2 = h 22 = h32 = h 42- 

e) hs- — ft. = 0. 

f) hl2 — hl3 — h32 + h33 = 0- 
§) h22 — h 32 = 0. 

10.2 Using the data from Exercise 8.1 where the sample sizes in each cell are equal, 
demonstrate that the type I, type II, type III, and type IV analyses are identical. 

10.3 Verify that the Drug LSMEANS for 10.1 are the averages of the two-way cell 
means. 

10.4 Use the method described in Section 10.4 to compute the type III analysis for the 
data set in Example 6.1. Use GLM to verify your results. 





Analyzing Large Balanced Two-Way Experiments 
Having Unequal Subclass Numbers 


In this chapter we present a method for obtaining an approximate analysis of balanced- 
treatment-structure experiments that have an unequal number of observations in each cell. 


11.1 Feasibility Problems 

We generally recommend using a general computing package such as one of those discussed 
in Section 10.7 to analyze unbalanced data sets in which every treatment combination is 
observed at least once. However, many situations arise in practice where it may not be 
feasible to do so, especially in developing countries where one may have several treatment 
factors with each factor having several different levels. In these cases, the exact procedures 
require several matrix inversions, and the size of the matrices to be inverted may exceed the 
capabilities of both the researcher and the computer that is available for use. 

For example, consider an experiment with four factors, each factor occurring at five 
levels. To obtain an exact analysis, one will have to be able to invert several large matrices. 
The size of the largest matrix requiring inversion in this situation for an exact analysis is a 
624 x 624 matrix in the unbalanced case. Obviously, such an experiment is not too unusual. 
In addition, some large experiments may require good initial starting values so that an 
iterative algorithm can converge to a reasonable solution. The methods in this chapter may 
allow researchers to find good initial starting values in those cases where she is having a 
hard time getting a program to converge. 

An alternative way to analyze these types of messy data situations is called the method of 
unweighted means (Bancroft 1968). In Section 11.2 the method is described for two-way 
treatment structures. While one rarely encounters two-way experiments that cannot be 
analyzed by general computing procedures, the method of unweighted means is easily 
discussed and understood for the two-way experiment, and the discussion readily genera¬ 
lizes to larger experimental situations. Before continuing, we want to point out that it is not 
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the number of observations measured that makes general procedures unfeasible, but the 
number of treatment combinations being studied. 


11.2 Method of Unweighted Means 

Basically, the method of unweighted means approximates the sums of squares correspond¬ 
ing to each of the effects by using the observed means of the various treatment combi¬ 
nations. The formulas given here are the ones used for equal sample sizes; however, if the 
sample sizes are only slightly unequal, they still give quite accurate results. The correct 
sums of squares can be obtained by using the type III analysis and the formulas presented 
in this chapter approximate the type III sums of squares. 

Consider the two-way models described in Chapters 9 and 10, where factor T had t possi¬ 
bilities and factor B has b possibilities. Let represent the response one expects to see on a 
randomly selected experimental unit that receives the combination of T, crossed with B r Let 

Ay = Vtj., i — 1,2,...,f, ; = 1,2,...,b 


Also let 



For testing H 0T : Mi- = M 2 . = • • • = p t ., we compute 

sst = A,. - A..) 2 = i>A 2 - btpt 

i= 1 1=1 

which is based on t - 1 degrees of freedom. For testing H 0B : M.i = p. 2 = • • • = M. ( ,, we compute 

ssb = tX(A. ; - A..) 2 = f X a 2 - bf A. 2 . 

/1 /1 


which is based on b - 1 degrees of freedom. For testing H 0TxB : p tj - p n - p, r + p ir = 0 for all 
i i' and j ^ we compute 

sst x b = XX(A.y - A,-. - A. j + A ..) 2 = XX A# - ^X A 2 - f X a 2 + bf Af. 

i=l ;=1 i=l ;=1 i=l /=1 

which is based on (f - 1)(£> - 1) degrees of freedom. Since the above sums of squares are 
computed on the basis of means, the usual sum of squares for error. 


SSError = XXXO/^-y ,) 2 

i =1 j-1 k =1 
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TABLE 11.1 


Unweighted Means Analysis of Variance Table 


Source of 
Variation 

df 

SS 

T 

f-1 

b'ZiPi.-P.f 

1=1 

B 

b -1 

t±(P,-P.f 

TxB 

(f-l)(fc-l) 

EX (Ay -P,-P.j + P.f 

i= 1 1=1 

Error 

N -tb 

|xi£cv,i t --y,y.) 2 

n 1=1 1=1 k= 1 


must be adjusted. This is because the p v have variances given by o 2 /n jj rather than a 2 , the 
variance of the y l]k . The adjustment is made by dividing the error sum of squares by 


n = 


t b ■i 

II- 

\%H n nj 


the harmonic mean of the cell mean sample sizes. 

Because the quantity n is the harmonic mean of the sample sizes, it is one possible 
average of the n tj . The degrees of freedom for error are still N - bt. The analysis of variance 
table for an unweighted means analysis is given in Table 11.1. This analysis yields reason¬ 
able approximations to the F-distribution when the cell sample sizes are not too unequal. 
The usual recommendation is that this analysis will be acceptable if the sample sizes vary 
by no more than a factor of 2. 


11.3 Simultaneous Inference and Multiple Comparisons 

The T marginal means and the B marginal means are defined just as they were in 
Chapters 9 and 10. Thus the T marginal means are given by p { . ,i = 1,2,... ,t and the B mar¬ 
ginal means are given by p.j,j = 1,2,... , b. The best estimates of these marginal means are 

„ l b 

Pi. — , ^ Piji i — 1,2,..., t 

b j=t 



j = 1 , 2 ,..., b, respectively 


and 
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The exact estimated standard error of p t . is 


s-e.(k) 



and the exact estimated standard error of p.j is 


s-e.(kj) 



where 


a = 


i = 1 ;=1 k=l _ 

N-tb 


fErrorMS 


If the cell sample sizes are not too unequal these standard errors can be approximated 
by d/^(bn) and ct/vffn), respectively. The estimated standard error of p,. - p?. is 


s.e.(p„-p,,.)= 2.r^ + L — 

b W n H H n i’i 


which can be approximated by <jf2/f (bn). Similarly, 


s.e.(p. - p 


•r) = f J±—+ ±— 

tV1h«,7 tfw, r 


which can be approximated by <jf2/f(tn). 

Next consider a 2 x 2 interaction contrast - p rj - p tj . + p t y for i i' and j J= j'. The best 
estimate of this interaction contrast is fl,j - p rj - Py + Py and its estimated standard error is 
given by 


s-e-dij-Pq 


p l ,+p,,) = a 


1 1 1 
— + — + — + 

n ij n n 



which can be approximated by a \ i (4//i) if the cell sample sizes are not too unequal. 

For each of the above functions of the cell mean parameters, test statistics are given by 

f __ estimate _ 

c estimated standard error 

where the corresponding hypothesis is rejected if | t c \ > t a/2 N _ tb . Furthermore, a (1 - a)100% 
confidence interval is given by 


estimate + t a/2 N _ th (estimated standard error) 
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Simultaneous inference procedures described in Chapter 3 can be used and a researcher 
is encouraged to do so. These ideas can be generalized to more general contrasts of the cell 
means and/or the marginal means. All of the formulas in Chapter 8 can be approximated 
if one simply replaces n with n. 


11.4 An Example of the Method of Unweighted Means 

As an example, consider again the data in Table 9.1. Recall that the error sum of squares 
was 20 with 10 degrees of freedom, and the table of means is repeated in Table 11.2. The 
value of n is 


n = 



•U 


1 / 111111 / 

(3)(2)U 2 3 2 3 3 J 



2.5714 


Next note that 


]£/} 2 = 20 2 + 25 2 + ••• + 28 2 = 3830 
£/Z 2 = 23 2 + 27 2 = 1258 
£q 2 = 23 2 + 24 2 + 28 2 = 1889 


and 

£2 = (25) 2 = 625 

Therefore 


SST = 3(1258)-6(625) = 24 
SSB = 2(1889) - 6(625) = 28 

and 


SST x B = 3830 - 3(1258) - 2(1889) + 6(625) = 28 


TABLE 11.2 

Cell Means and Marginal Means from Table 9.2 


Ay 

B, 

b 2 

B, 

ft. 

T 

20 

25 

24 

23 

T 2 

26 

23 

32 

27 

Ay 

23 

24 

28 

25 = ft. 
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TABLE 11.3 


The Unweighted Means Analysis of Variance Table for Data 
in Table 9.1 


Source of 
Variation 

df 

SS 

MS 

F 

p -Value 

T 

1 

24 

24 

30.85 

0.0002 

B 

2 

28 

14 

17.99 

0.0005 

TxB 

2 

28 

14 

17.99 

0.0005 

Error(adj) 

10 

2.5714 = 7 ' 7778 

0.77778 




The analysis of variance table is given in Table 11.3. Note that the F-statistics in Table 11.3 
are very similar to the exact F-statistics given in Table 9.3. 


11.5 Computer Analyses 

Although we do not recommend it, the statistics needed for an unweighted means analysis 
can be obtained in Excel® and other spreadsheet programs. The unadjusted error sum of 
squares can be obtained most efficiently by considering the experiment as a one-way 
experiment and utilizing a means model. To obtain the T, B, and TxB sums of squares 
required for the unweighted means analysis, one can follow the steps below. 

1) Obtain the cell means for each treatment combination. 

2) Obtain the T marginal means, the B marginal means, and the overall mean. 

3) Compute the variances of the cell means, the T marginal means, and the B mar¬ 
ginal means. Denote these by S^, Sf, and Sf respectively. 

4) Then 


error SS = £ yf jk - ( bt - \)S 2 TPi + btp: 

i,j,k 

SST = b(t - 1)S* (1L1) 

SSB = t(b - 1)S| 

and 

SST x B = (bt - 1)S™ - b(t - 1)S* - t(b - 1)S 2 B 

If one has programs that will perform a one-way ANOVA, and a program that will per¬ 
form a two-way ANOVA for balanced data, then one can get the error sum of squares from 
the one-way ANOVA program by treating the two-way treatment structure as a one-way 
treatment structure using a means model. The sums of squares for T, B, and TxB can be 
obtained by analyzing the cell means with the two-way ANOVA program that works for 
balanced data. SAS® code that will give the statistics needed to perform an unweighted 
means analysis is given in Table 11.4. It will be up to the reader to run this code should 
she wish to do so. 
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TABLE 11.4 

SAS Code to Provide Statistics Needed for an Unweighted Means Analysis 

DATA; 

INPUT T B Y; 

TB =10 *T+B; 

CARDS; 

1 1 19 

1120 
1121 
1 2 24 

1 2 26 
1 3 22 

1 3 25 

1 3 25 

2 1 25 

2 127 

2 2 21 
2 2 24 

2 2 24 

2 3 31 

2 3 32 

2 3 33 

PROC PRINT; 

TITLE 'AN UNWEIGHTED MEANS ANALYSIS OF A TWO-WAY WITH UNEQUAL SUBCLASS NUMBERS'; 

PROC ANOVA; 

TITLE2 'THIS ANALYSIS GIVES THE UNADJUSTED ERROR SUM OF SQUARES IN TABLE 11.3'; 
CLASS TB; 

MODEL Y=TB; 

RUN; 

PROC SORT; by t b ; RUN; 

PROC MEANS NOPRINT; BY T B; 

VARIABLE Y; 

OUTPUT OUT=MEANS MEAN=YBAR; 

RUN; 

PROC ANOVA; 

TITLE2 'THIS ANALYSIS GIVES THE T, B, AND T*B SUMS OF SQUARES GIVEN IN TABLE 11.3'; 
CLASSES T B; 

MODEL YBAR=T B T*B; 

MEANS T B T*B; 

RUN; 


11.6 Concluding Remarks 

In this chapter, we introduced a method for obtaining satisfactory statistical analyses of 
experiments involving large numbers of different treatment combinations where each 
treatment combination is observed at least once. The techniques are useful for persons 
who may not have access to sophisticated statistical software programs. 
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11.7 Exercises 

11.1 Use the data in Table 9.1 to verify the formulas given in Equation 11.1. 

11.2 If you have SAS available to you, run the code given in Table 11.4. 

11.3 Compute the approximate standard errors for the T marginal means and the B 
marginal means for the data in Table 9.1 using the formulas sxfpj.) = a/'l(bn) 
and sx.fpf = (j/f(tn), and compare them to the exact standard errors given in 
Chapter 9. 

11.4 Obtain the unweighted means analysis of variance for the drug-disease data in 
Exercise 9.1. 

11.5 In Exercise 11.4, compute the estimates and their approximate estimated stan¬ 
dard errors for each of the following. 

1) rv-pu 

2) hi 2 — Pl3 ~ P32 + P 33 

3) P 22 ~ P 32 

4) P 2 . 

5) P‘2 




Case Study: Balanced Two-Way Treatment 
Structure with Unequal Subclass Numbers 


In this chapter, we analyze a set of data arising from a two-way treatment structure 
conducted in a randomized complete block design. The experiment was intended to have 
been balanced, but due to unforeseen circumstances some treatment combinations were 
missing from some blocks. We still assume, however, that every treatment combination is 
observed at least once. The cases where some treatment combinations are never observed 
are discussed in Chapters 13-15. 


12.1 Fat-Surfactant Example 

A bakery scientist wanted to study the effects of combining three different fats with each 
of three different surfactants on the specific volume of bread loaves baked from doughs 
mixed from each of the nine treatment combinations. Four flours of the same type but from 
different sources were used as blocking factors. That is, loaves were made using all nine 
treatment combinations for each of the four flours. Unfortunately, one container of yeast 
turned out to be ineffective, and the data from the 10 loaves made with that yeast had to be 
removed from the analysis. Fortunately, all nine fat x surfactant treatment combinations 
were observed at least once. The data are given in Table 12.1. 

The data in Table 12.1 were analyzed using the SAS®-GLM procedure. Since all treatment 
combinations are observed at least once, and since some of the same treatment combinations 
are observed in each block (flour), the type III sums of squares test hypotheses that are 
interesting and easy to interpret. The hypotheses tested by the type III analysis would be 
the same as those tested if there were no missing data. Consequently, we can predict what 
those hypotheses are, and we do not need to include the E3 option to identify the hypoth¬ 
eses that are being tested. The normal equations are being solved using the sum-to-zero 
restrictions, so the results from a Solution option will not be interesting and consequently 
the option is not used in this SAS-GLM analysis. All marginal means are estimable, and 
their estimates have been obtained by using the LSMeans option. 
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TABLE 12.1 


Specific Volumes from a Baking Experiment 


Fat 

Surfactant 


Flour 



1 

2 

3 

4 

1 

1 

6.7 

4.3 

5.7 



2 

7.1 


5.9 

5.6 


3 


5.5 

6.4 

5.8 

2 

1 


5.9 

7.4 

7.1 


2 


5.6 


6.8 


3 

6.4 

5.1 

6.2 

6.3 

3 

1 

7.1 

5.9 




2 

7.3 

6.6 

8.1 

6.8 


3 


7.5 

9.1 



The data are analyzed using the SAS commands given in Table 12.2. The ANOVA results 
from the commands in Table 12.2 are given in Table 12.3. The type III F-value for the 
fat x surfactant interaction is F = 8.52, which is significant at the p = 0.0011 level. Thus, sur¬ 
factants should be compared within each fat level, and the fats should be compared within 
each surfactant level. Consequently, the fat x surfactant least squares means are given in 
Table 12.4 and pairwise comparisons among these two-way least squares means are given 
in Table 12.5. 

Figure 12.1 gives a plot of the two-way least squares means; sample means located within 
the same circle are not significantly different. The p-values used are those given in 
Table 12.5. 


TABLE 12.2 

SAS Analyses of Data in Table 12.1 

DATA BREAD; 

INPUT FAT SURF F1-F4; 

LINES; 


1 

1 

6 

.7 

4 . 

3 

5 

.7 


1 

2 

7 

.1 


5 

.9 

5 

.6 

1 

3 


5 

.5 

6 

.4 

5 

.8 

2 

1 


5 

.9 

7 

.4 

7 

.1 

2 

2 


5 

.6 


6 

.8 


2 

3 

6 

.4 

5 . 

1 

6 

.2 

6 

3 

1 

7 

.1 

5 . 

9 




3 

2 

7 

.3 

6 . 

.6 

8 

.1 

6 

3 

3 


7 

.5 

9 

.1 



; 

RUN; 







DATA 

BREAD 

; 





SET BREAD; 

DROP F1-F4; 

FLOUR=l ; SPVOL=Fl; OUTPUT; 
FLOUR=2 ; SPVOL=F2; OUTPUT; 
FLOUR=3 ; SPVOL=F3; OUTPUT; 
FLOUR=4 ; SPVOL=F4; OUTPUT; 

RUN ; 

PROC PRINT DATA=BREAD; RUN; 


Continued 
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TABLE 12.2 (continued) 


PROC GLM DATA=BREAD; 

CLASSES FLOUR FAT SURF; 

MODEL SPVOL=FLOUR FAT | SURF; 

LSMEANSFAT | SURF/PDIFF STDERR; 
ODSDUTPUT LSMEANS=LSM; 


RUN ; 


SYMBOL1 V=SQUARE I=JOIN C=BLACK L=1 ; 
SYMBOL2 V=CIRCLE I=JOIN C=BLACK L=2; 
SYMBOL3 V=DIAMOND I=JOIN C=BLACK L=3; 
AXIS1 ORDER=(l TO 3 BY 1 ) OFFSET=(l CM,); 

PROC GPLOT; WHERE EFFECT='FAT_SURF ' ; 

PLOT LSMEAN*FAT=SURF/HAXIS=AXIS1; 

RUN; 


TABLE 12.3 

Model ANOVA and Tests on Main Effects and Interaction 


Source 

df 

Sum of Squares 

Mean Square 

T-Value 

Pr>F 

Model 

11 

22.51952891 

2.04722990 

12.38 

<0.0001 

Error 

14 

2.31585570 

0.16541826 



Corrected total 

25 

24.83538462 






Type III SS 




Flour 

3 

8.69081097 

2.89693699 

17.51 

<0.0001 

Fat 

2 

10.11784983 

5.05892492 

30.58 

<0.0001 

Surfactant 

2 

0.99720998 

0.49860499 

3.01 

0.0815 

Fat x surfactant 

4 

5.63876453 

1.40969113 

8.52 

0.0011 


TABLE 12.4 

Fat x Surfactant Least Squares Means 






LSMean 

Fat 

Surfactant 

SPVOL LSMean Standard Error 

Pr > |f| 

Number 

1 

1 

5.53635388 

0.24036653 

<0.0001 

1 

1 

2 

5.89132489 

0.23921852 

<0.0001 

2 

1 

3 

6.12291175 

0.24137422 

<0.0001 

3 

2 

1 

7.02291175 

0.24137422 

<0.0001 

4 

2 

2 

6.70848186 

0.30057982 

<0.0001 

5 

2 

3 

6.00000000 

0.20335822 

<0.0001 

6 

3 

1 

6.62864505 

0.30066843 

<0.0001 

7 

3 

2 

7.20000000 

0.20335822 

<0.0001 

8 

3 

3 

8.58889843 

0.30013634 

<0.0001 

9 
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TABLE 12.5 


Pairwise Comparisons among the Fat x Surfactant Least Squares Means 


Least Squares Means for Effect Fat X Surfactant Pr> | 11 
Dependent Variable: SPVOL 

for H 0 : LSMean(i) 

= LSMean(j) 

Hi 

1 

2 

3 

4 

5 

6 

7 

8 

9 

t 


0.3156 

0.1105 

0.0007 

0.0098 

0.1630 

0.0118 

0.0001 

<0.0001 

2 

0.3156 


0.5099 

0.0052 

0.0546 

0.7344 

0.0788 

0.0009 

<0.0001 

3 

0.1105 

0.5099 


0.0169 

0.1428 

0.7028 

0.2203 

0.0042 

<0.0001 

4 

0.0007 

0.0052 

0.0169 


0.4184 

0.0059 

0.3341 

0.5836 

0.0010 

5 

0.0098 

0.0546 

0.1428 

0.4184 


0.0712 

0.8550 

0.1971 

0.0006 

6 

0.1630 

0.7344 

0.7028 

0.0059 

0.0712 


0.1053 

0.0009 

<0.0001 

7 

0.0118 

0.0788 

0.2203 

0.3341 

0.8550 

0.1053 


0.1378 

0.0004 

8 

0.0001 

0.0009 

0.0042 

0.5836 

0.1971 

0.0009 

0.1378 


0.0018 

9 

<0.0001 

<0.0001 

<0.0001 

0.0010 

0.0006 

<0.0001 

0.0004 

0.0018 



SPVOL LSMean 



FIGURE 12.1 Plot of least squares means. Means located within the same circle are not significantly different. 


From Figure 12.1, we can make the following observations: 

1) The combination of fat 3 with surfactant 3 gives a response that is significantly 
higher than those given by all other treatment combinations. 

2) Fat 3 generally gives a response that is significantly higher than that given by fat 1. 

3) There is no difference in the surfactant levels when they are used with fat 1. 
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12.2 Concluding Remarks 

In the next chapter, the case where some treatment combinations are never observed is 
discussed. Such cases require additional care in selecting a proper analysis. In this chapter, 
we considered the analysis of a balanced two-way treatment structure in a randomized 
complete block design when some treatment combinations are missing in some blocks. 
The analysis described is appropriate only when each treatment combination is observed 
at least once. SAS-GLM was used to obtain the analysis. The data set can be more 
appropriately analysed by considering sources of flow as a random effect as discussed in 
Chapter 22. 


12.3 Exercises 

12.1 The data in the following table are breaking strengths of beams made from com¬ 
binations of types of cement and mixtures of aggregate. Four beams were made 
from each combination, but some of the beams were of poor quality due to the 
fabrication process and were not tested for strength. 

1) Use a means model and determine if there is an interaction between the 
levels of cement and levels of aggregate. 

2) Use an effects model and determine if there is an interaction between the 
levels of cement and levels of aggregate. 

3) Carry out a complete analysis of the data. 

4) Exclude cement type 3 from the data set and work through parts 1-3 with 
the reduced data set. 


Cement 

Aggregate A 

Aggregate B 

Aggregate C 

Aggregate D 

Type 1 

21 

19 

19 

23 

Type 1 

27 

19 

16 

24 

Type 1 

19 

22 

— 

23 

Type 1 

— 

— 

— 

— 

Type 2 

25 

23 

19 

28 

Type 2 

23 

20 

18 

27 

Type 2 

24 

24 

— 

25 

Type 2 

— 

18 

— 

— 

Type 3 

20 

28 

14 

23 

Type 3 

24 

— 

16 

25 

Type 3 

— 

— 

12 

22 

Type 3 

— 

— 

— 

22 





Using the Means Model to Analyze Two-Way 
Treatment Structures with Missing Treatment 
Combinations 


In this chapter and the next two chapters, we discuss the analysis of two-way treatment 
structures when some treatment combinations are never observed. These kinds of experi¬ 
mental situations often occur in practice, mostly by chance but sometimes by design. When 
the experimenter does have control over the experiment, extreme care should be taken to 
ensure that all treatment combinations are observed. 

Many statistical packages contain routines that calculate test statistics for experiments 
with missing treatment combinations, but it is shown in this chapter that the observed 
values of those test statistics often have little, if any, meaning. Thus, the available statisti¬ 
cal packages may give the experimenter a false sense of security about the analysis 
when, in fact, the analysis automatically provided is not generally an analysis of interest. 
The following sections point out some of the problems and provide methods to obtain an 
appropriate, meaningful analysis. 


13.1 Parameter Estimation 

As in Chapter 9, the effect of missing treatment combinations complicates the analysis 
enough that a very simple set of hypothetical data is used to aid the discussion. A realistic 
example is discussed in Chapter 15. 

Consider the hypothetical data in Table 13.1 from a two-way treatment structure in a 
completely randomized design with treatments T and B each having three levels. Let 
represent the response expected when treatments T ; and B ( are applied to a randomly 
selected experimental unit. A general means model for this experiment is 

Y ijk = H,j +e ijk , z' = l,2,...,f; j = l,2,...,b- r k = l,2,...,n ij (13.1) 
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TABLE 13.1 


A Two-Way Experiment with Missing Treatment 
Combinations 



Bi 

b 2 

B, 

h 

2,4 


7,6 

T 2 

3 

14 

10,9 

D 

6,6 

9 



TABLE 13.2 

Cell Mean Parameters for Data in Table 13.1 




b 2 

b 3 

h 

A* ii 


A% 

t 2 

A*21 

22 

M23 

D 

/%1 

M32 



where ~ i.i.d. N( 0, (J 2 ). If n, ; - = 0 for any i and j, then the treatment combination T, with B ( 
is not observed. Table 13.2 contains the cell mean parameters for those treatment combina¬ 
tions that were observed at least once. Note that, even though Table 13.2 does not have cell 
mean parameters corresponding to the (1, 2) cell and the (3, 3) cell, it is assumed that such 
parameters exist. That is, we let p 12 and p 33 represent the means model parameters corre¬ 
sponding to the (1, 2) cell and (3, 3) cell, respectively. 

Whenever treatment combinations are missing, certain hypotheses cannot be tested with¬ 
out making some additional assumptions about the parameters in the model. Hypotheses 
involving parameters corresponding to the missing cells generally cannot be tested. For 
example, hypotheses that involve p 12 and/or p 33 cannot be tested without making some 
assumptions about p 12 and p 33 . If one were able to assume that p rl was equal to ,u ]v for exam¬ 
ple, then a test of a hypothesis involving p 12 can be carried out since ,u rl can be estimated by 
p n = (2 + 4)/2 = 3 for the data in Table 13.1. Most experimenters would not be willing to 
make this kind of an assumption. For the data in Table 13.1, it is not possible to estimate (or 
test hypotheses about) linear combinations that involve the parameters ,u 12 and p 33 unless 
one is willing to make some assumptions about these two parameters. One common 
assumption is that there is no interaction between the levels of T and the levels of B. This 
would be equivalent to assuming that 

Mu = Mu ~~ Mu + Mu ant l Mss = ~Mu + Mu, + Mu 
as well as assuming that 

Mu ~ Mu, ~ Mn + Mis = 0 an d Mu ~ Mu ~ Msi + Msi = 0 

In our opinion, such assumptions should not be made without some supporting experi¬ 
mental evidence that the assumptions are likely to be true. All too often experimenters are 
willing to assume no interaction exists among the factors or treatments in any of their 
experiments, mainly because they do not understand how to deal with such an interaction 
or because they believe they are not interested in it. Neither of these is a justifiable reason 
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for assuming no interaction between the two sets of treatments. If interaction exists, the 
experimenter must deal with it and must be interested in making inferences about the 
interaction. In Chapter 8, we discussed methods for dealing with interaction when all 
treatment combinations are observed. The kinds of questions considered there can also be 
considered here. 

As previously stated, it is not possible to make inferences about functions of parameters 
involving missing treatment combinations. For example, it is not possible to test = p 2 . = 
,il 3 . or p A = jl .2 = p. 3 since these hypotheses involve parameters about which we have no 
information. Indeed, it is not possible to estimate all of the population marginal means. 
For the above data, one cannot estimate jJ h and jJ 3 ., nor can one estimate jJ. 2 and p. 3 . 
However, one can estimate p 2 . and p v since these marginal mean parameters do not involve 
parameters corresponding to missing cells. As one would expect, the best estimates of the 
parameters of model (13.1) are 

Liij = yij., i = 1,2,...,f; j = l,2,...,b if«, 7 >0 

and 

'Ziy.jk-yijf 


where N = n.. and C = the total number of observed treatment combinations. If n„ > 0, the 
sampling distribution of is N(jU,y, cr 2 /n,j), i = 1,2,..., t; j = 1,2,..., b and the sampling dis¬ 
tribution of (N- C)a 2 /a : is £ 2 (N- C). In addition, jl^, i = 1,2,...,f; /' = 1,2, ...,b and cf 2 are 
statistically independent. 


13.2 Hypothesis Testing and Confidence Intervals 

Clearly, one method of analyzing experiments with missing treatment combinations is to 
use the procedures discussed in Chapter 1; in fact, this is often the best method. That is, the 
procedures in Chapter 1 can be used to test hypotheses about any linear combinations of 
the /J.jj corresponding to observed treatment combinations. We illustrate using the data in 
Table 13.1. 

13.2.1 Example 13.1 

Suppose we wish to obtain a 95% confidence interval for /J 2 . in Table 13.1. First the esti¬ 
mates of the means in the observed cells are y u = 3, = 6.5, fi 2] = 3, jl 22 = 14, fi 23 = 9.5, 

/i 31 = 6, fi 32 = 9, and the estimate of a 1 is 

(2-3) 2 +(4-3) 2 + (7-6.5) 2 + (6-6.5) 2 + (3-3) 2 + (14-14) 2 

- + (1 0-9.5) ; + (9-9.5)~ + (6-6)~ + (6-6) ? + (9-9)~ 

11-7 

= -=0.75 
4 
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Also this estimate of experimental error, a 1 , is based on 4 degrees of freedom. The best 
estimate of jZ 2 . is p 2 . = (3 + 14 + 9.5)/3 = 5.5 and its estimated standard error is 


s.e.(p 2 ) 


u 

f 1 1 1 

, \<r\ 

- 1 - 1 - 

V 1 

V n 21 n 22 n 23 y 

b 


0.75 


1 1 1 

-1-1- 

1 1 2 


0.4564 


Thus a 95% confidence interval for jj 2 . is 5.5 ± (2.776)(0.4564) = 5.5 ± 1.267 or 4.233 < 
p 2 - < 6.767. 

Suppose we wish to determine whether there is interaction in this two-way experiment 
with missing cells. Generally, the the test for interaction in a 3 x 3 experiment would be 
based on 4 degrees of freedom if all nine treatment combinations were observed, but because 
of the two missing cells, there are only two linearly independent contrasts that measure 
interaction in this case. Two linearly dependent contrasts that measure interaction are 

Mil — M 13 — M 21 + M 23 an d fJ-21 ~ M 22 — M 31 + M 32 


Thus, consider testing 

txb : Mil — M 13 — M 21 + M 23 = 0 and p 21 — p 22 — p 31 + p 32 = 0 

Using the matrix procedure introduced in Chapter 1, we can test the above hypothesis by 
testing CfJ, = 0 where 


Mu 

M13 


1 - 1-1 0 1 0 0 
0 0 1 - 10-11 


and 


M21 


M = 


M22 


M23 


M31 


M32 


The sum of squares due to H 0TxB is 

SS T/II = (Cjl)'(CDC) 1 ( Cfi)' where D = Diag(i 1, U, i,l) 

We get 


SS 


TxB 


[3 -8] 



18.5161 


and this sum of squares is based on 2 degrees of freedom. The corresponding F-statistic is 
F = 18.5161/0.75 = 12.34 with 2 and 4 degrees of freedom; this F is significant at the 0.0194 
level. 
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It is not possible to test y v = y 2 . = y 3 . for the data in Table 13.1 because of the two missing 
cells. However, it is possible to test 


^or : (M 11 + f%)/2 — (Mu + fe)/^ and {ji 2l + y 22 )/2 — (fJ.31 + jU 32 )/2 

In a very broad sense, these are T main-effect type hypotheses as the first equation com¬ 
pares T, with T 2 after averaging over B, and B 3 while the second equation compares T, with 
T 3 after averaging over B, and B 2 . An experimenter must determine if either of these 
hypotheses are of interest, either individually or simultaneously; we use them to illustrate 
this method. 

To test the hypotheses given by H 0T , we can take 


11 - 10-1 00 

0 0 11 0 - 1-1 


Again, D = Diag(|, \, 


1,1,|, \, l). Then the 


sum of squares due to H 0T is 


SS T = (CfiXCDCYXCjl)' = [-3 



3.8065 


with 2 degrees of freedom. The appropriate F-statistic is F = (3.8065/2)/0.75 = 2.5377 with 
2 and 4 degrees of freedom, the corresponding p-value is 0.1943 and hence, H 0T cannot 
be rejected. 

The reader might have noticed that CDC’ was the same for both of these last two exam¬ 
ples; this is coincidental and is not generally true. When some treatment combinations 
are not observed, it is often best to consider the experiment as a one-way experiment and 
use computing routines similar to those described in Section 1.7 to answer the important 
questions. However, since many statistical packages provide certain tests automatically 
when an effects model is used, many experimenters have preferred them. Such analyses 
are described in Chapter 14. 


13.3 Computer Analyses 

The data in Table 13.1 can be analyzed using the SAS®-GLM procedure using the state¬ 
ments in Table 13.3. Since the T and B main effects are not included in the Model statement 
and the Noint option is used as an option on the Model statement, a two-way means model 
is used to describe these data. The two Contrast statements that are used in the analysis 
correspond to the two sums of squares calculated in Section 13.2, namely the sums of 
squares for H 0TxB and H 0T . 

A listing of the data is in Table 13.4 and the general form of the estimable functions is 
shown in Table 13.5. An examination of Table 13.5 reveals that every linear combination of 
the cell means is estimable. Also note that there are no rows in the general form of esti¬ 
mable functions corresponding to the two missing cells. 
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TABLE 13.3 

SAS-GLM Commands 

DATA; 

INPUT T B Y@@; 

CARDS; 

11 2 11 4 

13 7 13 6 

2 1 3 

2 2 14 

2 3 10 2 3 9 

3 1 6 3 16 

3 2 9 

PROC PRI NT; 

TITLE 'EX. 13.1 - A TWO-WAY WITH MISSING CELLS'; 

RUN: 

PROC GLM; 

CLASSES T B; 

MODEL Y = T*B/NOINT E SOLUTION; 

ESTIMATE 'T2_LSM' T*B 001110 0/DIVISOR=3; 

CONTRAST ' Ho_T*B' T*B 1-1-10 10 0, T*B 001-10-11; 
CONTRAST 'Ho_T' T*B 11-10-100, T*B 00110-1-1; 
LSMEANS T*B/PDIFF STDERR; 

RUN; 


TABLE 13.4 

Data for Example 13.1 


Observation 

T 

B 

Y 

1 

1 

1 

2 

2 

1 

1 

4 

3 

1 

3 

7 

4 

1 

3 

6 

5 

2 

1 

3 

6 

2 

2 

14 

7 

2 

3 

10 

8 

2 

3 

9 

9 

3 

1 

6 

10 

3 

1 

6 

11 

3 

2 

9 


The ANOVA table is shown in Table 13.6. The F-value of 122.10 corresponding to the 
model F, the type I analysis, and the type III analysis is testing the hypothesis that all p :; 
are equal to zero. That is, F = 122.10 tests 

H 0 : Pu = hi3 = hn = Ihi = /% = bai = h32 = 0 

Note also that the error mean square is 0.75, which agrees with the estimate of a 1 given in 
Section 13.2. 
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TABLE 13.5 

Estimable Functions 


Effect 


Coefficients 

TxB 

1 1 

LI 

TxB 

1 3 

L2 

TxB 

2 1 

L3 

TxB 

2 2 

Li 

TxB 

2 3 

L5 

TxB 

3 1 

L6 

TxB 

3 2 

L7 


TABLE 13.6 

ANOVA Table for Example 13.1 


Dependent Variable: Y 

Source 

df 

Sum of Squares 

Mean Square 

F-Value 

Pr>T 

Model 

7 

641.0000000 

91.5714286 

122.10 

0.0002 

Error 

4 

3.0000000 

0.7500000 



Uncorrected total 

11 

644.0000000 





Coefficient of 





R-Square 

Variation 

Root MSE 

YMean 



0.974771 

12.53458 

0.866025 

6.909091 



Source 


Type I SS 




TxB 

7 

641.0000000 

91.5714286 

122.10 

0.0002 

Source 


Type III SS 




TxB 

7 

641.0000000 

91.5714286 

122.10 

0.0002 


TABLE 13.7 


Results from the Contrast Statements for Example 13.1 


Contrast 

df 

Contrast SS 

Mean Square 

F-Value 

Pr > F 

HqtB 

2 

18.51612903 

9.25806452 

12.34 

0.0194 

Hqt 

2 

3.80645161 

1.90322581 

2.54 

0.1943 


The results from the two Contrast statements are shown in Table 13.7. Note that these 
agree with the calculations shown in the last section. The results of the Estimate statement 
are given in Table 13.8. This corresponds to the T 2 main effect mean given in the last section. 
Table 13.9 contains the results from the Solution option, which are the estimates and stan¬ 
dard errors of each of the cell mean parameters. These estimates are the same as those in 
Table 13.10, which came from the LSMeans statement. Table 13.10, also gives comparisons 
between all pairs of the two-way cell means, which are produced by including the pdiff 
option on the LSMeans statement. 
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TABLE 13.8 


Results from the Estimate Statements for Example 13.1 


Parameter 

Estimate 

Standard Error 

t- Value 

Pr> \t\ 

T2_LSM 

8.83333333 

0.45643546 

19.35 

<0.0001 


TABLE 13.9 


Results from Solution Option on the Model Statement for Example 13.1 


Parameter 



Estimate 

Standard Error 

t- Value 

Pr > | f | 

TxB 

1 

1 

3.00000000 

0.61237244 

4.90 

0.0080 

TxB 

1 

3 

6.50000000 

0.61237244 

10.61 

0.0004 

TxB 

2 

1 

3.00000000 

0.86602540 

3.46 

0.0257 

TxB 

2 

2 

14.00000000 

0.86602540 

16.17 

<0.0001 

TxB 

2 

3 

9.50000000 

0.61237244 

15.51 

0.0001 

TxB 

3 

1 

6.00000000 

0.61237244 

9.80 

0.0006 

TxB 

3 

2 

9.00000000 

0.86602540 

10.39 

0.0005 


TABLE 13.10 


Two-Way Least Squares Means for Example 13.1 


Least Squares Means 

T B 

YLSMean 

Standard Error 

Pr> \t\ 

LSMean Number 

1 

1 

3.0000000 


0.6123724 

0.0080 


1 

1 

3 

6.5000000 


0.6123724 

0.0004 


2 

2 

1 

3.0000000 


0.8660254 

0.0257 


3 

2 

2 

14.0000000 


0.8660254 

<0.0001 


4 

2 

3 

9.5000000 


0.6123724 

0.0001 


5 

3 

1 

6.0000000 


0.6123724 

0.0006 


6 

3 

2 

9.0000000 


0.8660254 

0.0005 


7 

Least Squares Means for Effect TxB 






Pr > |t| for H 0 : LSMean(i) = LSMean(j) 






Dependent Variable: Y 







i/j 

1 

2 

3 

4 

5 

6 

7 

1 


0.0156 

1.0000 

0.0005 

0.0017 

0.0257 

0.0048 

2 

0.0156 


0.0299 

0.0021 

0.0257 

0.5946 

0.0779 

3 

1.0000 

0.0299 


0.0009 

0.0036 

0.0474 

0.0080 

4 

0.0005 

0.0021 

0.0009 


0.0132 

0.0017 

0.0151 

5 

0.0017 

0.0257 

0.0036 

0.0132 


0.0156 

0.6619 

6 

0.0257 

0.5946 

0.0474 

0.0017 

0.0156 


0.0474 

7 

0.0048 

0.0779 

0.0080 

0.0151 

0.6619 

0.0474 



Note: To ensure overall protection level, only probabilities associated with preplanned comparisons 
should be used. 
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13.4 Concluding Remarks 

In this chapter we discussed some of the complications that result whenever some treat¬ 
ment combinations are not observed. In this chapter we used the means model (the effects 
model is used in Chapter 14) to describe the kinds of analyses that are possible. The impor¬ 
tant thing to remember when some treatment combinations are not observed is that some 
hypotheses of interest may not be testable unless some additional assumptions can be 
made about the parameters in the model. However, such assumptions should not be made 
without evidence to support them. 


13.5 Exercises 

13.1 The following data was collected from a two-way treatment structure in a 
completely randomized design structure. Use a means model to answer the 
following questions. 



B1 

B2 

B3 

Al 

19, 22,17 

29, 25 


A2 

A3 

37 

34, 34, 42, 43 

26,31,23 

A4 

23, 26 

27,23,31,38 

30,14,18 


1) Test the equal means hypothesis. 

2) List the contrasts that measure interaction. 

3) Use SAS-GLM or another statistical program of your choice to find estimates 
and their corresponding standard errors for each of the following linear 
combinations of the p it provided that the linear combination is estimable. 

a) Ps.-jdi. 

b) Ihi-lhz 

p n + jU 12 
' 2 

d) M22 + P 2 3 

2 

e) Hi. 

n AL2 + M22+M42 

3 

g) P. 


4) Find the F-statistic corresponding to each of the following hypotheses if the 
hypothesis is testable. 


a) 


Pn + Pa 


Pm + P42 


2 


2 
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b) A41 — l *42 — i°43 

\ t h 1 + H 42 — ^ 23 + ^ 43 
c ' 2 ~ 2 

13.2 An experiment was conducted to determine the amount of a product (g) pro¬ 
duced from 1000 g of product with a given concentration of the target compound 
where the reaction was carried out at different temperatures. A completely ran¬ 
domized design was used to assign the combinations of concentration of the 
compound and temperature to a set of runs where each combination was to be 
observed three times. Some of the runs did not produce usable data, the reaction 
did not occur at lower concentrations at lower temperatures, and some of the 
higher concentrations would have produced dangerous results at some of the 
higher temperatures. Thus there are unequal numbers of observations per cell 
and some cells could not be observed. Use the following data to: 

1) Determine the number of linear combinations of the cell means that measure 
the interaction between the levels of concentration and temperature. Write 
out those combinations in terms of the cell means and estimate those 
contrasts. 

2) Carry out a complete analysis of the data set and make possible inferences 
about the levels of concentration, the levels of temperature, and their 
interactions. 


Amount of Product Produced during the Reaction, for Exercise 13.2 



Temperature = 

1 Temperature = 2 

Temperature = 3 

Temperature = 4 

t Temperature = 5 

Concentration = 

1 




20, 22 

26, 29, 29 

Concentration = 

2 



16,14,18 

22 

25, 24, 22 

Concentration = 

3 


12,15,13 

20, 22,18 

24, 23 

29,30,26 

Concentration = 

4 

12,15 

17,12 

23, 20, 20 

27,25 


Concentration = 

5 

14,16 

19,18, 21 

22,24 






Using the Effects Model to Analyze Two-Way 
Treatment Structures with Missing Treatment 
Combinations 


In Chapter 13 we discussed the use of the means model in analyzing two-way treatment 
structures when some treatment combinations are not observed. In this chapter we 
consider the use of the effects model for analyzing the same type of situation. Using the 
effects model does not enable one to answer any questions that cannot be answered using 
the means model, and vice versa. While the means model is very simple and easy to under¬ 
stand, the effects model appears to be much more complex than it really is. We prefer to 
use the means model and are discussing the effects model here only because many statisti¬ 
cal packages seem to recommend and encourage the use of the effects model. The effects 
model considered in this chapter is: 

}/ij k = li+T : i + Pj+Yij + £ i jk, i = 1/2,..., t; j = 1,2,... ,b; k= 0,1,2,...,no¬ 
where Ejj k ~ i.i.d. N( 0, ex 2 ), and /?,, = 0 implies that the ( i , ;)th treatment combination is not 
observed. 


14.1 Type I and II Hypotheses 

Type I and II analyses for two-way treatment structures with missing treatment combina¬ 
tions can be defined as for treatment structures where all combinations are observed. That 
is, successive models can be fit, and the resulting reductions in the residual sum of squares 
are determined as different effects are added to the model. 

To illustrate, consider a type I analysis of the data in Table 13.1. We will only consider the 
results here. Readers interested in the actual model fitting results can refer to the previous 
edition of this book. The type I and II analyses of variance tables for these data are given 
in Tables 14.1 and 14.2, respectively. 


255 



256 


Analysis of Messy Data Volume 1: Designed Experiments 


TABLE 14.1 


Type I Analysis of Variance Table 


Source of 
Variation 

df 

SS 

MS 

F 

a 

Total 

10 

118.909 




T 

2 

36.159 

18.080 

24.11 

0.0059 

B 

2 

61.234 

30.617 

40.82 

0.0022 

TxB 

2 

18.516 

9.258 

12.34 

0.0194 

Error 

4 

3.000 

0.75 




TABLE 14.2 

Type II Analysis of Variance Table 

Source of 
Variation 

df 

SS 

MS 

F 

a 

Total 

10 

118.909 




T 

2 

13.784 

6.892 

9.19 

0.0319 

B 

2 

61.234 

30.617 

40.82 

0.0022 

TxB 

2 

18.516 

9.258 

12.34 

0.0194 

Error 

4 

3.000 

0.75 




In Chapter 10 it was shown that the type I and II hypotheses may not make much sense 
when the data are unbalanced even though all treatment combinations are observed at 
least once. It is perhaps obvious then that these two kinds of hypotheses would not all of a 
sudden make sense in the case where there are missing treatment combinations. To see 
that this, in fact, is true, the hypotheses that are being tested by the type I and II analyses 
are given in Tables 14.3 and 14.4 in terms of a means model and in Tables 14.5 and 14.6 in 
terms of an effects model. The entries in these tables can be determined from an SAS®- 
GLM analysis of the data in Table 13.1 or by using the general formulas given in Table 10.9 
and Equation 10.3 as these formulas are also correct for the missing-cells problem. 

Next, we discuss possible interpretations of the type I and II main-effect hypotheses. 
The type I and II hypotheses will generally not make much sense unless one's objective is 
to build a simple model for making predictions rather than to test hypotheses about the 
effects of the different treatment combinations. For model building, the interpretations are 
exactly the same as they were in Chapter 10, where we discussed the case with no missing 
treatment combinations. As in Chapter 10, if the numbers of observations in each cell are 


TABLE 14.3 

Hypotheses for a Type I Analysis for the Data in Table 13.1 in Terms 
of the Means Model 

Source of Variation Type I Hypotheses 

T A i + 3 — p,i — U \2 — 0 and Ai + /A + 2 Ass — 2 p 31 — 2/r 32 = 0 

f* 5/rii — 5/j, 3 + 3 // 2 i + 1A ~ 4/Vt + /A — /A = 0 and 

/A — fA + 2/A — 2 As — Ai + / A — 0 

T x B ft i — /A — A + A 3 = 0 and A — /A — Ah A 2 = 0 
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TABLE 14.4 

Hypotheses for a Type II Analysis for the Data in Table 13.1 in Terms 
of the Means Model 


Source of Variation 

Type II Hypotheses 

T 

2/in + / 1 13 + /i 2 2 — M 23 — M 31 ~ M 32 = 0 arid 

2 Mu — 2 /i 13 + 3^2! + 4 h 22 + 2/^23 — 5/i 31 — 4 /i 32 = 0 

B 

5fi u — 5/i 13 + 3 /i 2 i + M 22 — 4/123 + M 31 — M 32 = 0 and 

Mil ~ Ml3 2/^22 — 2/123 — M 3 I M 32 = 0 

TxB 

M 11 — M 13 — M 21 M 23 = 0 and ~ M 22 — M 31 M 32 = 0 

TABLE 14.5 

Hypotheses for a Type I Analysis for the Data in Table 13.1 in Terms 
of the Effects Model 

Source of Variation 

Type I Hypotheses 

T 

Ti - % - 1/2 ft + 1/2 ft + 1/2 7n + 1/2 y 3 - 1/2 y 31 - 1/2 y 32 = 0 and 
T 2 -T 3 -l/4ft-l/4/3 2 + l/2A + l/4 7 21 + l/4y 22 + l/2r 23 - 

1/2 %i ~ 1/2 7> 2 = 0 

B 

Pi - ft+ (l/9)(5y u - 5y 13 + 3y 21 + - 4y a + y 31 - y 32 ) = 0 and 

Pi — P 3 (l/3)(Kl — 7l3 + ^% 2 — ^% 3 — Til + T 32 ) = 0 

TxB 

7ll - 7l3 - 721 + 723 = 0 and 721 - 722 - 731 + 732 = 0 

TABLE 14.6 

Hypotheses for a Type II Analysis for the Data in Table 13.1 in Terms 
of the Effects Model 

Source of Variation 

Type II Hypotheses 

T 

A - t 3 + (l/3)(2y n + y 13 + y 22 - y 23 - 2y 31 - y 32 ) = 0 and 

r 3 + (1 /9)(2y n — 2y 13 + 3y 21 + 4% 2 + 2% 3 — 5y 3 — 4y 2 ) = 0 

B 

P1-P3+ (l/9)(5y u - 5y 13 + 3y 21 + 722 - 47,3 + y 31 - y 32 ) = 0 and 
Pi~ IT + (1/3)(7h — Y 13 + 2^2 — 2^3 — 7i + 72) = 0 

TxB 

7n - 7i3 - 721 + 723 = 0 and y 21 - y 22 - y 31 + y 32 = 0 


proportional to the actual numbers of each treatment combination existing in the popula¬ 
tion, then the experimenter might be interested in R(r | ,u) and R(p \ y). Both can be obtained 
by conducting two type I analyses, one with T first in the model statement and the other 
with B first in the model statement. 


14.2 Type III Hypotheses 

When all treatment combinations are observed, the type III hypotheses are the same as 
those tested when there are equal subclass numbers. When some treatment combinations 
are missing, such hypotheses cannot be tested since they involve parameters about which 
there is no information. For the data in Table 13.1, we cannot estimate y v and y 3 . since 
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we cannot estimate p 12 and p 33 . Likewise, we cannot estimate jJ. 2 and p. 3 . Hence, it is not 
possible to test jj h = jl 2 . = p 3 . or J1 A = p. 2 = jl . 3 . 

Both the type I and II hypotheses for the main effects depend on the numbers of observa¬ 
tions in each cell. As long as there is at least one observation in a cell, then that cell mean is 
estimable. Thus functions of the parameters that are estimable depend only on which of 
the treatment combinations are observed and not on how many times they are observed. 

Type III hypotheses are developed so that they do not depend on the cell sizes, but only 
on which cells are observed. This is consistent with the definition of type III hypotheses 
for two-way experiments, where all treatment combinations are observed. That is, the 
hypotheses = p 2 . = p 3 . and = p. 2 = fi. 3 do not depend on the cell sizes. We are not 
going to discuss the construction of type III hypotheses for the missing data case. Even 
though the objectives may seem reasonable, we think that the type III hypotheses are the 
worst hypotheses to consider when there are missing cells because there seems to be no 
reasonable way to interpret them. To illustrate, a type III analysis of the data in Table 13.1 
is given in Table 14.7. The hypotheses being tested by the type III analysis are given in 
Tables 14.8 and 14.9. Examination of Tables 14.8 and 14.9 reveals that the type III hypotheses 


TABLE 14.7 


Type III Analysis of Variance Table 


Source of Variation 

df 

SS 

MS 

F 

a 

Total 

10 

118.909 




T 

2 

10.788 

5.394 

7.19 

0.0473 

B 

2 

68.906 

34.453 

45.94 

0.0017 

TxB 

2 

18.516 

9.258 

12.34 

0.0194 

Error 

4 

3.000 

0.75 




TABLE 14.8 

Hypotheses for a Type III Analysis for the Data in Table 13.1 in Terms 
of the Means Model 


Source of Variation 

Type III Hypotheses 

T 

2 /in + [i l3 + H 22 ~ Lhs ~ 2/^31 — M 32 = ^ an d 2 /i n — 2 /i 13 + 6/^21 + 

7^22 + 2/^23 — 8^31 — 7^2 — 0 

B 

7l* 11 ~ 7fi l3 + 6/41 + 2/^22 — 8/43 + 2|% — 2 /% 2 = 0 and fi n — [i l3 + 7fi 22 — 
2 a% — fhi + A 32 = 0 

TxB 

~ A*i 3 — M 21 = 0 and fi 2 1 — fi 22 — fi 31 + 2 = 0 


TABLE 14.9 


Hypotheses for a Type III Analysis for the Data in Table 13.1 in Terms 
of the Effects Model 


Source of Variation 

Type III Hypotheses 

T 

Tj - t 3 + (l/3)(2y n + y 13 + y 22 - 723 - 2y 3 i - 732 ) = 0 and 
t 2 — t 3 + (l/15)(2y u — 2y 13 + 6y 21 + 7^ + 2 y^ — 8 y 31 — 7y 32 ) = 0 

B 

P1-P3+ (l/15)(7y n — 7y 13 + 6 y 21 + 2^2 — 8^23 + 2 y 31 — 2y 32 ) = 0 and 
P 2 ~ P 3 + (1 / 8)(7u — 7 i 3 2^2 — 2^3 — Tii + Ti 2 ) = 0 

TxB 

7n-7i3-72i + 723 = 0 and 721 - 7» 2 - %i + 7 32 = 0 
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are not meaningful except for the TxB interaction. The results in Tables 14.7-14.9 were 
taken from a SAS-GLM analysis of the data in Table 13.1. 


14.3 Type IV Hypotheses 

In the previous two sections, we attempted to show that none of the so-called main-effect 
hypotheses tested by the type I, type II, or type III analyses are entirely satisfactory when 
there are missing treatment combinations, since they rarely have any reasonable interpre¬ 
tations if there is a possibility of interaction between the two factors. Such hypotheses are 
extremely difficult to interpret because the coefficients of cell means occurring in the same 
row or column are rarely the same. Type IV hypotheses are constructed so that the cell 
mean coefficients are balanced; hence, the resulting hypotheses are interpretable. 

To illustrate, let us look at all possible type IV hypotheses that are testable in the data set 
of Table 13.1. Table 14.10 gives the cell mean parameters for the non-missing cells. Basically, 
for a two-way treatment structure, a hypothesis is defined to be a Type IV type hypothesis if it com¬ 
pares the levels of one treatment averaged over one or more common levels of the other treatment. 
Thus, the hypotheses are expected to be marginal means hypotheses except that, when 
treatment combinations are missing, one cannot average across all levels of the other 
treatment but only across some of the levels of the other factor. 

A type IV hypothesis that compares Tj with T, after averaging over B 1 and B 3 is 
H 0 \ (p n + Lpf/2 = (p 2] + ipf/l. Another type IV hypothesis that compares T, with T 2 is 
H 0 : ,u,| = p 2l . This latter hypothesis compares T, with T 2 after averaging over B 1 only. 
All possible type IV hypotheses for T in terms of means model parameters are given in 


TABLE 14.10 

Cell Mean Parameters for Data in Table 13.1 



Bi 

b 2 

b 3 

Ti 

Mil 

~ 

Mi 3 

T 2 

M21 

M22 

M23 

T 

M 31 

M32 

— 


TABLE 14.11 

All Possible Type IV Hypotheses for T for the Data in 
Table 13.1 


/hi + .a 1 3 _ /hi + U23 
2 ~ 2 

/h1 + /hi _ /h 1 + Ah2* 
2 ~ 2 
/hi “ /hi 
/hi ~ /hi* 

/hi ~ /hi 
/hi — /hi 
/h 3 “ /h i 


Hypotheses automatically tested by a SAS-GLM type IV analysis. 
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TABLE 14.12 

All Possible Type IV Hypotheses for B for the Data in 
Table 13.1 

Mu + /hi _ M 13 + U 23 * 

2 ~ 2 

/Ol + Ahl _ U'22 + U 32 
2 ~ 2 

/a = n 11 
/A = /hi 
/hi = /hi 
/^22 = /hi 
hill = B 32 


* Hypotheses automatically tested by a SAS-GLM type IV analysis. 


Table 14.11. Similar results can be obtained for the type IV hypotheses for B. All possible 
type IV hypotheses for B are given in Table 14.12. 

SAS-GLM automatically generates type IV hypotheses that can usually be interpreted, 
but an appropriate interpretation cannot be made without first examining the type IV esti¬ 
mable functions to see exactly what hypotheses SAS-GLM generated. That is, there is no 
unique interpretation appropriate for all data sets as there is in the case when there are no 
missing treatment combinations. In fact, relabeling the treatments before doing the analy¬ 
sis may result in different type IV hypotheses being generated, and hence there will be 
different sums of squares and F-values in the type IV analysis. Thus, the type IV analysis 
obtained is not a unique characteristic of the data and depends on how the treatments are 
labeled. Obviously, this is not very desirable, but it is unavoidable. SAS-GLM indicates this 
situation has occurred by placing an asterisk on the printed degrees of freedom and noting 
that"Other Type IV Testable Hypotheses exist which may yiecildlfferent 
SS." The type IV hypothesis for T that was automatically tested by an SAS-GLM analysis 
of the data in Table 13.1 is equivalent to simultaneously testing and 

„ „ j /Li +M22 /Li + Ihi 
M11 = M 31 and -2-=-2 — 


Thus, the type IV hypothesis for T simultaneously compares the effect of T, and T 3 at level 
1 of B and the effect of T 2 and T 3 averaged over levels 1 and 2 of B. These two functions are 
identified with the asterisks in Table 14.11. We note that level 3 of B is not involved at all in 
this particular set. The type IV analysis of variance table obtained from SAS-GLM is given 
in Table 14.13. 

Type IV hypotheses not tested by SAS-GLM, but perhaps just as interesting to the experi¬ 
menter as those that are automatically tested, are given in Table 14.11. In order to test such 
interesting type IV hypotheses, one can (and should) use Estimate or Contrast options. For 
example, to test all of the type IV hypotheses in Table 14.11 when using the effects model, 
we would use the following Estimate statements in a SAS-GLM analysis: 


ESTIMATE 

ESTIMATE 

ESTIMATE 

ESTIMATE 

ESTIMATE 


' T1 VS T2 AVE OVER Bl AND B3' Tl-1 0T*B .5 .5-.5 0-.5 0 0; 

'T2 VS T3 AVE OVER Bl AND B2' T 0 1 -1 T*B 0 0 .5 .5 0 -.5 -.5; 

'T1 VS T2 AT Bl' Tl-1 0 T*B 10-10000; 

'TI VS T3 AT Bl' T 1 0 -1 T*B 10000-10; 

'T2 VS T3 AT Bl' T 0 1 -1 T*B 00100-10; 
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TABLE 14.13 


Type IV Analysis of Variance Table 


Source of 
Variation 

df 

SS 

MS 

F 

a 

Total 

10 

118.909 




T 

2* 

12.769 

6.385 

8.51 

0.0362 

B 

2* 

70.179 

35.089 

46.79 

0.0017 

TxB 

2 

18.516 

9.258 

12.34 

0.0194 

Error 

4 

3.000 

0.75 




* Other type IV testable hypotheses exist which may 
yield different SS. 


TABLE 14.14 

Results of ESTIMATE Options 


Parameter 

Estimate 

Standard Error 

f-Value 

Pr > | f | 

T1 vs T2 ave over B1 and B3 

-1.50000000 

0.68465320 

-2.19 

0.0936 

T2 vs T3 ave over B1 and B2 

1.00000000 

0.81009259 

1.23 

0.2846 

T1 vs T2 at B1 

- 0.00000000 

1.06066017 

-0.00 

1.0000 

T1 vs T3 at B1 

-3.00000000 

0.86602540 

-3.46 

0.0257 

T2 vs T3 at B1 

-3.00000000 

1.06066017 

-2.83 

0.0474 

T2 vs T3 at B2 

5.00000000 

1.22474487 

4.08 

0.0151 

T1 vs T2 at B3 

-3.00000000 

0.86602540 

-3.46 

0.0257 


ESTIMATE 'T2 VS T3 AT B2' T 0 1 -1 T*B 000100-1; 

ESTIMATE 'T1 VS T2 AT B3' T 1 -1 0 T*B 0100-100; 

The results of the above Estimate statements are shown in Table 14.14. 

If the reader does not have SAS-GLM available, all of the hypotheses in Tables 14.11 and 
14.12 can be tested by using the means model and procedures similar to those given in 
Section 13.2. 


14.4 Population Marginal Means and Least Squares Means 

Population marginal means and least squares means are defined here in the same way that 
they were defined in Sections 9.5 and 10.6. However, if a particular treatment is not observed 
with all possibilities for the other treatment factor, then the corresponding population 
marginal mean is not estimable. In this case, the table of two-way cell means (for example, 
the pjj) can be used to compare each observed treatment combination with all other 
observed treatment combinations. If a data set is quite sparse, few if any population 
marginal means will be estimable. 

For the data in Table 13.1, p 2 . and p. t are the only population marginal means that are 
estimable. Their best estimates are p 2 . = 8.333 and p. } = 4.00, respectively. In general, the best 
estimate of c !/V u ;/ is X c.p.., and its estimated standard error is a V ['L i<i (c?/nf\ where the 
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sums are taken over all nonempty cells. A (1 - a)100% confidence interval for £. .c,^ is 
given by 


S C >'jfiij 



A f-statistic with v degrees of freedom for testing E ( -c,-^ = 0 is given by 


t = 


C iijfrij 


hi 



In both instances, v is the degrees of freedom corresponding to ct 2 , the error mean square. 

For researchers wishing to make multiple comparisons, we recommend using the 
observed p-values given by the above f-tests whenever the F-value for comparing all treat¬ 
ment combinations is significant. If this F-value is not significant, then one should use 
Bonferroni's method on all comparisons of interest. That is, declare that linear combina¬ 
tions are significantly different from zero if the p-value obtained is less than a/p where p 
is the total number of planned comparisons. For data snooping and unplanned compari¬ 
sons, one should use a Scheffe procedure. 


14.5 Computer Analyses 

The reader should use his or her own statistical package to analyze the examples given in 
this chapter and in Chapters 15 and 17. Comparing the results of the analyses so obtained 
with those given in this book will give the reader valuable insight into the kinds of 
hypotheses tested by the packages she is accustomed to using. We know of no package that 
handles the analysis of data with missing treatment combinations adequately or com¬ 
pletely. Several do a good job with unbalanced data provided that there are no missing 
treatment combinations. 

Anyone who does many statistical analyses on data with missing treatment combina¬ 
tions should learn how to use a package that allows a specified set of hypotheses to be 
tested. Then, and only then, can one be sure that the hypotheses tested are reasonable, 
meaningful, and interpretable. SAS-GLM and SAS-Mixed allow users to specify their 
own hypotheses. 


14.6 Concluding Remarks 

In summary, an acceptable analysis of data with missing treatment combinations requires 
a great deal of thought. An experimenter or statistician cannot simply run a computer 
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program on the data and then select numbers from that program to report in a paper. 
Unfortunately, this has been done and is being done by uninformed experimenters and 
data analysts. We hope that anyone who has studied this chapter will never do so again. 
Those willing to exert the necessary effort to analyze their data correctly are advised to 
use the means model discussed in Chapter 13. 

A more realistic example is discussed in Chapter 15. 


14.7 Exercises 

14.1 The data in Exercise 13.1 were collected from a two-way treatment structure in a 

completely randomized design structure. Use an effects model to answer the 

following questions. 

1) List all possible type IV hypotheses for A. 

2) List all possible type IV hypotheses for B. 

3) Find the type IV sums of squares for both A and B that are automatically 
computed by SAS-GLM. 

4) Give the hypotheses being tests by the analysis in part 3. 

5) Compute LSMEANS for A, for B, and for AxB ; perform all possible multiple 
comparisons among the two-way AxB means. 

6) Compute another type IV sum of squares for both A (with 3 degrees of 
freedom) and B (with 2 degrees of freedom) that are different from the type 
IV sum of squares automatically given by SAS-GLM in part 1. 

7) Compare the analyses from Exercise 13.1 and this one, that is, compare the 
means model analysis to the effects model analysis. 





Case Study: Two-Way Treatment Structure 
with Missing Treatment Combinations 


15.1 Case Study 

In Chapters 13 and 14 we discussed the analysis of two-way treatment structures in a com¬ 
pletely randomized design structure when there are missing treatment combinations. In 
this chapter we illustrate how to analyze a two-way treatment structure in a randomized 
complete block design when some treatment combinations are not observed. Consider the 
data in Table 15.1, which is obtained from the experiment described in Chapter 12, but with 
a few treatment combinations not being observed in any of the blocks. Figure 15.1 shows 
the treatment combinations observed at least once. Any hypothesis that involves treatment 
combinations (fat 1, surfactant 3) or (fat 2, surfactant 2) cannot be tested unless additional 
assumptions are made. In this discussion, let FS,y represent the response expected when fat 
i and surfactant j are assigned to a randomly selected experimental unit. 

To get the error sum of squares for this experiment, we can fit either an effects model or 
a means model in a randomized block design structure using any of the available statistical 
packages. The model to fit an effects model is 

MODEL SPVOL = BLK FAT SURF FAT*SURF; 

And the model to fit a means model for the two-way Fat x Surfactant combinations is 
MODEL SPVOL=BLK FAT*SURF / NOINT; 

After fitting either of these models, one finds that the error sum of squares is equal to 
2.0941, with 11 degrees of freedom. Thus 

a 2 = 2.0941/11 = 0.1904 

If all treatment combinations had been observed, there would have been 4 degrees of 
freedom for interaction hypotheses. Since two treatment combinations are never observed. 
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TABLE 15.1 


Specific Volumes from the Baking Experiment in Chapter 12 


Fat 

Surfactant 


Flour 


1 

2 

3 

4 

1 

1 

6.7 

4.3 

5.7 



2 

7.1 


5.9 

5.6 


3 





2 

1 


5.9 

7.4 

7.1 


2 






3 

6.4 

5.1 

6.2 

6.3 

3 

1 

7.1 

5.9 




2 

7.3 

6.6 

8.1 

6.8 


3 


7.5 

9.1 



only two degrees of freedom remain for interaction hypotheses. Two independent con¬ 
trasts in the interaction space are FS n - FS 12 - FS 31 + FS 32 and FS 21 - FS 23 - FS 31 + FS 33 . 
An SAS®-GLM type IV analysis tests these two contrasts equal to zero simultaneously The 
value of the F-statistic for testing the two contrasts equal to zero simultaneously is 

F = (5.4002/2)/0.1904 = 14.18 
with 2 and 11 degrees of freedom. 

All possible type IV hypotheses for fat are given in Table 15.2. The hypothesis in 4 was 
automatically tested by the SAS-GLM type IV analysis for fat. Hypotheses 1-3 can be tested 
with Contrast statements should one want to do so, and hypotheses 2 and 3 can also be 
examined using Estimate statements. The results of the statistical tests performed are also 
shown in Table 15.2. 

All possible type IV hypotheses for surfactant are specified in Table 15.3. The last 
equality in hypothesis 3 and hypothesis 5 were automatically tested by an SAS-GLM 


Surfactant 


_1_2_3 

1 x x - 

Fat 2 x - x 

3 x x x 


FIGURE 15.1 Treatment combination observed in baking experiment. 


TABLE 15.2 


Type IV Hypotheses for Fat 



Flypothesis 

df 

F 

p-Value 

1 

FSn = FS 21 = FS 31 

2 

8.69 

0.006 

2 

fs 12 = FS 3 2 

1 

15.18 

0.003 

3 

8 

<-T) 

Uh 

II 

CO 4 

Uh 

1 

43.89 

0.000 

4 

FS U + FSj 2 — FS 31 + FS 32 

FS 2 1 + fs 23 = FS 31 + fs 32 

2 

13.25 

0.001 
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TABLE 15.3 

Type IV Hypotheses for Surfactant 



Hypothesis 

df 

F 

p-Value 

1 

FSji = FS U 

i 

0.85 

0.376 

2 

FS 2 1 = FS 23 

i 

9.11 

0.012 

3 

FS 3 1 = FS 3 2 = fs 33 

2 

9.95 

0.003 

4 

FS n + FS 31 = FS U + FS 32 

1 

2.64 

0.132 

5 

FS 21 + FS 31 = FS 23 + FS 33 

1 

2.79 

0.123 


type IV analysis for surfactant. The F-value given was F = 6.34 with 2 and 11 degrees of 
freedom. All five hypotheses can be tested using Contrast statements, and all but hypothe¬ 
sis 3 can be tested with an Estimate statement. 

Since there is significant interaction in these data, it is probably best to compare all 
observed treatment combinations by examining the least squares means of the seven treat¬ 
ment combinations observed. The least squares means and the p-va lues for comparing 
them to one another using pairwise f-tests are given in Table 15.4. We noted that, when the 


TABLE 15.4 

Least Squares Means, f-Statistics and p-Values for Pairwise Comparisons 
Least Squares Means 


Fat Surfactant SPVOL LSMean LSMean Number 


1 


1 


5.54120221 


1 


1 


2 


5.88331748 


2 


2 


1 


7.02225604 


3 


2 


3 


6.00000000 


4 


3 


1 


6.64163972 


5 


3 


2 


7.20000000 


6 


3 


3 


8.59518737 


7 


Least Squares Means for Effect Fat x Surfactant t for H„; 
Dependent Variable: SPVOL 

LSMean(i) = 

LSMean())/Pr >|t| 

Hi 

1 

2 

3 

4 

5 

6 

7 

i 


-0.92322 

-3.98937 

-1.35133 

-2.71158 

-4.88578 

-7.50717 



0.3757 

0.0021 

0.2037 

0.0202 

0.0005 

0.0001 

2 

0.923221 


-3.09314 

-0.34525 

-1.79001 

-3.89591 

-6.39489 


0.3757 


0.0102 

0.7364 

0.1010 

0.0025 

0.0001 

3 

3.989367 

3.093142 


3.019 

0.894121 

-0.52493 

-3.85221 


0.0021 

0.0102 


0.0117 

0.3904 

0.6101 

0.0027 

4 

1.351332 

0.34525 

-3.019 


-1.6363 

-3.88951 

-6.62522 


0.2037 

0.7364 

0.0117 


0.1300 

0.0025 

0.0001 

5 

2.711581 

1.790011 

-0.89412 

1.636298 


-1.42392 

-4.26043 


0.0202 

0.1010 

0.3904 

0.1300 


0.1822 

0.0013 

6 

4.885782 

3.895912 

0.524926 

3.889509 

1.42392 


-3.56176 


0.0005 

0.0025 

0.6101 

0.0025 

0.1822 


0.0045 

7 

7.507171 

6.394888 

3.852207 

6.625224 

4.260426 

3.561758 



<0.0001 

<0.0001 

0.0027 

<0.0001 

0.0013 

0.0045 



Note: To ensure overall protection level, only probabilities associated with preplanned com¬ 
parisons should be used. 
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SPVOL LSMean 



Fat 

Surfactant □ □ □ 1 0-0 -0 2 O- -0- 3 

FIGURE 15.2 Fat x surfactant least squares means. Means located within the same circle are not significantly 
different. 


design structure is completely randomized, the best estimates of the population cell means 
are the means of the observations in each cell. (This is not true in a randomized block 
design.) The best estimates can easily be obtained with a computing package or by using 
the methods of Chapter 6. From such estimates, one can construct Figure 15.2, where means 
that are not significantly different have been enclosed in the same circle. 


15.2 Concluding Remarks 

This chapter illustrated the analysis of a two-way treatment structure experiment in a 
randomized complete block design when some treatment combinations are not observed. 


15.3 Exercise 

15.1 A study was conducted to investigate the effects of three different levels of exer¬ 
cise on the systolic blood pressure of female humans in five age groups. Subjects 
were selected from three different gyms where gym is considered to be a block¬ 
ing factor. The subjects were randomly assigned to an exercise level within a gym 
and their systolic blood pressure was evaluated after six months on the pro¬ 
gram. Four subjects were selected from each age group at the beginning of the 
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study, but several dropped out before the end of the six months, thus providing 
an unbalanced data set. The systolic blood pressure data are given in the follow¬ 
ing table. The age groups are 1 = (age < 25), 2 = (25 < age < 35), 3 = (35 < age < 45), 
4 = (45 < age < 55), and 5 = (age > 55). The exercise levels are 1 = (low intensity 
for 30 min), 2 = (medium intensity for 45 min), and 3 = (high intensity for 60 min), 
where each person exercises 3 days a week. 


Exercise Level 

Agel 

Age 2 

Age 3 

Age 4 

Age 5 

Gym = 1 

1 

— 

— 

132 

134 

138 

1 

— 

— 

— 

— 

137 

1 

— 

— 

— 

— 

129 

2 

— 

118 

121 

142 

— 

2 

— 

129 

123 

140 

— 

2 

— 

115 

— 

131 

— 

3 

110 

109 

126 

— 

— 

3 

116 

117 

120 

— 

— 

3 

113 

— 

125 

— 

— 

3 

107 

— 

108 

— 


Gym = 2 

1 

— 

— 

116 

— 

136 

1 

— 

— 

128 

— 

— 

1 

— 

— 

134 

- 

— 

2 

— 

114 

118 

124 

— 

2 

— 

108 

118 

125 

— 

2 

— 

121 

115 

132 

— 

2 

— 

— 

118 

135 

— 

3 

— 

98 

118 

— 

— 

3 

— 

119 

123 

— 

— 

3 

— 

118 

— 

— 

— 

3 


110 

— 

— 


Gym = 3 

1 

— 

— 

137 

139 

142 

1 

— 

— 

141 

148 

147 

1 

— 

— 

144 

129 

141 

1 

— 

— 

133 

137 

— 

2 

— 

120 

120 

137 

— 

2 

— 

120 

122 

143 

— 

2 

— 

129 

117 

127 

— 

3 

118 

117 

138 

— 

— 

3 

110 

123 

— 

— 

— 


1) Determine the number of linearly independent interaction contrasts between 
age and exercise intensity, and write out one set. 

2) Use both the means model and the effects model to test the set of interaction 
contrasts that you gave in part 1. 



270 


Analysis of Messy Data Volume 1: Designed Experiments 


3) Determine all of the type IV estimable functions for age main effects. 

4) Determine all of the type IV estimable functions for the exercise intensity 
main effects. 

5) Carry out a complete analysis using the effects model (include tests of 
hypotheses, confidence intervals and multiple comparisons). 

6) Carry out a complete analysis using the means model (include tests of 
hypotheses, confidence intervals and multiple comparisons). 




Analyzing Three-Way and Higher-Order 
Treatment Structures 


In Chapters 7-15, we discussed the analysis of two-way treatment structures. The methods 
and results given in those nine chapters can be generalized to more complex treatment 
structures; such analyses become only slightly more complicated as the complexity of the 
treatment structure increases. We illustrate the method of generalization by specifically 
addressing the analysis of three-way treatment structures. 

In Section 16.1 we give a general strategy to follow when analyzing higher-order treat¬ 
ment structures. Section 16.2 discusses the analysis of balanced and unbalanced treatment 
structures. The discussion of unbalanced experiments includes the case where each treat¬ 
ment combination is observed at least once and the case where some treatment combina¬ 
tions are missing. 


16.1 General Strategy 

Suppose that treatments T„ B jr and C k are applied simultaneously to the same experimental 
unit. Let fi ljk represent the expected response to the treatment combination (T„ B ; , C,) for 
i = 1,2,..., f; j = 1,2,..., b; and k = 1,2,..., c. There is no three-way interaction among these 
treatment combinations provided that 

(Bijk ~~ Bi'jk ~ Bij'k Bi'j’k) ~ ( Buk' ~ Bi'jk' — Bij'k' Bifk ') ~ ^ b i ’ /' j > and k 

This implies that the TxB interaction at level k of factor C is the same as the TxB inter¬ 
action at level k' of factor C for all values of k and k'. Similarly, the TxC interaction is the 
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same at all levels of factor B , and the B x C interaction is the same at all levels of factor T. 
Equivalent expressions of the no-interaction statements are: 

1) % - flij. - Pi.k - fi.jk + Uu. + h. ; . + U..k - £..= 0 for all i, j, and k. 

2) There exist parameters ,u, T„ r 2/ ...,r„ P 2 , ...,A„ 4, 4, • • •,4/ 7n, Tu/• • •,7*/ '7ri/ 'hi* 

..., Tj tcf 0 n , 0 I2 ,..., 0 hc such that 

% = ju + T + Pj + 4* + 7, + Vik + 9 jk for all i,/, and k 

That is, the p ijk can be described by main effects and two-factor interaction effects. When 
analyzing three-way treatment structures, the first and most important step is to deter¬ 
mine whether there is a three-factor interaction, even though the experimenter may not be 



FIGURE 16.1 Strategy for analyzing three-factor experiments. 
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interested in it. If there is no three-factor interaction, then the second step is to determine 
whether there are any two-factor interactions. If there are also no two-factor interactions, 
then each of the main effects can be analyzed. If a three-factor interaction exists, the experi¬ 
menter should analyze the two-way treatment structures of two of the treatment factors at 
each level of a selected third treatment factor, usually the factor of least interest. Obviously, 
these two-way analyses could be done by letting each treatment be the selected third one. 
The types of analyses that can be obtained with statistical computing packages are similar 
to those available for two-way treatment structures. 

Figure 16.1 presents a general strategy for analyzing three-way treatment structures. 
This strategy can also be applied to four-way and higher-order treatment structures. If all 
treatment combinations are observed an equal number of times, the resulting data can be 
analyzed by using many different kinds of statistical software. 


16.2 Balanced and Unbalanced Experiments 

If all treatment combinations are observed, but observed an unequal number of times, then 
type III analyses can be used. If every treatment combination is observed at least once, all 
main-effect and interaction hypotheses can still be tested, and the questions answered are 
the same as those that would be answered with complete balance everywhere. 

If some treatment combinations are missing, then, as was the case in Chapter 13, no 
hypothesis involving the missing treatment combinations can be tested. The experimenter 
should specify her own type IV hypotheses between treatments of interest. Such hypo¬ 
theses can be tested by using the matrix procedure described in Chapter 1 or by contrast 
statements available in many statistical computing packages. 

The example in Chapter 17 demonstrates some of the steps that may be required in order 
to obtain a complete analysis of data with missing treatment combinations. 


16.3 Type I and II Analyses 

In Chapter 10, type I and II analyses were described for a two-way treatment structure 
experiment. Both of these analyses produce sums of squares for each of the effects using 
the model comparison procedure. The type I analysis fits models sequentially and each 
effect is adjusted for all of the other effects that preceded it in the model. In the type II 
analysis, the sum of squares for each effect is adjusted for all other effects that are at the 
same or lower level. Consider the three-way model given by 

}hjkt = B + Tj + Bj + (' TB)ij + C k + ( TC) ik + ( BC) jk + (TBC) ijk + £ ijke 
for i = 1,2,...,f, j = 1,2,... ,b) k= l,2,...,c; and t = 1, 2, ..., n ijk 

Suppose that n ijk > 0 for all i,j, and k. That is, each of the three-way cells is observed at least 
once. Table 16.1 shows the type I and II sums of square along with their corresponding 
degrees of freedom using the reduction notation described in Chapter 10. 
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TABLE 16.1 


Type I and II Sums of Squares for a Three-Way Experiment with at Least One Observation in Each Cell 


Source of 
Variation 

df 

Type I SS 

Type II SS 

T 

t-1 

R(T\M) 

R(T\n,B,C) 

B 

6-1 

R(B\b,T) 

R(B\H,T,C ) 

TxB 

(f -1)(6 -1) 

R(TxB\n,T,B) 

R(T x B | n, T,B,C,TxC,Bx C) 

C 

c -1 

R(C\n,T,B,TxB) 

R(C | n, T, B) 

TxC 

(f-D(c-i) 

R(TxC\iu,T,B,TxB,C) 

R(T x C \fi, T, B, T x B, C,BxC) 

BxC 

(b-l)(c-l) 

R(B xC\lu,T,B,TxB r C,TxC) 

R(BxC\iu,T,B,TxB,C,TxC,Bx C) 

TxBxC 

(t - l)(b - l)(c - 

-1) R(T xBxC\n,T,B,TxB,C,TxC, 
BxC) 

R(TxBxC\iu,T,B,TxB,C,TxC,BxC) 


16.4 Concluding Remarks 

This chapter discussed the analysis of three-way and higher-way treatment structure 
experiments. A flow chart was given that provides a general strategy for analyzing such 
experiments. 

It is important to examine the highest-order interaction effects first. Many experimenters 
avoid considering higher-order interactions because they are often not quite sure how to 
deal with these interactions. Such temptations should be avoided. Even though experience 
has shown us that very-high-order interactions are seldom significant, they must be dealt 
with whenever they are. The techniques discussed in Chapter 8 can be generalized to 
three-way and higher-way treatment structures and may help determine which treatment 
combinations are causing the interaction. The cause of the intersection may be the most 
important information identified in the study. 


16.5 Exercises 

16.1 Consider a study with four factors, denoted by A, B, C, and D, each at two levels 
for a total of 16 treatment combinations. The means model that can be used to 
describe the resulting data is 

Vijkim - Hijki + z.jkim, i = 1 2, 7 = 1,2, A: = 1,2, 1 = 1,2, m = 1,2,..., n ijkU where n ijkl > 0. 

Specify the contrasts of the p ijk! that measure the following: 

1) The main effects of A, B, C, and D. 

2) All possible two-way interactions. 

3) All possible three-way interactions. 

4) The four-way interaction. 
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16.2 Use the following data to compute the contrasts, estimated standard errors and 
95% confidence intervals for the contrasts in Exercise 16.1. 


Data for Exercise 16.2 



A = 0 

B = 0 

A = 0 

B = 1 

A = 1 

B = 0 

A = 1 

B = 1 


D = 0 

D = 1 

D = 0 

D = 1 

D = 0 

D = 1 

D = 0 

D = 1 

o 

II 

o 

6,8 

9 

8,5 

11 

7 

13,10 

10,11 

14 

C = 1 

10 

12,10,11 

12,10 

12,13 

8, 9,11 

12,16 

14,12 

18,16,18 





Case Study: Three-Way Treatment Structure 
with Many Missing Treatment Combinations 


In this chapter we show a detailed analysis of a three-way treatment structure when many 
treatment combinations are missing. 


17.1 Nutrition Scores Example 

A home economist conducted a sample survey experiment to study how much lower- 
socioeconomic-level mothers knew about nutrition and to judge the effect of a training 
program designed to increase their knowledge of nutrition. A test was administered to the 
mothers both before and after the training program, and the changes in their test scores 
were measured. These changes are reported in Table 17.1. The mothers tested were classi¬ 
fied according to three factors: age, race, and whether they were receiving food stamps. 


17.2 An SAS-GLM Analysis 

Table 17.2 gives the type IV analysis of variance table for the data in Table 17.1 obtained 
from SAS®-GLM using the SAS commands: 

PROC GLM; 

CLASSES GROUP AGE RACE; 

MODEL GAIN=GROUP|AGE|RACE/SOLUTION E4; 

LSMEANS GROUP|AGE|RACE/PDIFF STDERR; 

RUN; 

Table 17.2 reveals that the estimate of <7 2 for the data in Table 17.1 is a 1 = 2627.4724/92 = 28.56 
with 92 degrees of freedom. Table 17.2 also shows that there are zero degrees of freedom 
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TABLE 17.1 


Changes in Scores on a Nutrition Test between Post-Training and Pretraining 


Group 


Food Stamps 


No Food Stamps 

Age Classification 

Black 

Hispanic 

White 

Black 

Hispanic 

White 

1 

2 

4,4 


-8,9 

5,0,10,3,3, 7,7,4 

-4, -2, 0, 0, 5, 

-6,2 

5, -2, -10 

7,2, -13,2,3,3, 
-4,-5 

3 

4 

1, 5,15, 9 

-3 

0 

4,5, 0,5,2, 8,1, -2,6, 
6,4, -5,6,3, 7,4,5, 
12,3,8,3, 8,13,4, 7, 
9,3,12,11,4,12 

-6, -5,5, 8,5, 6, 7,6, 
2,7,5 

3, -14, -14, 
-1, 3,1 

0 

-1,6 

-20,6,9, -5,3, 

-1,3,0,4, -3, 

2,3, -5,2, -1, 
-1, 6, -8, 0, 2 


for the three-factor interaction hypothesis, which indicates that there are no contrasts in 
these data that can be used for estimating a three-factor interaction. This does not imply 
that there is not a three-factor interaction among the factors group, age, and race, only that 
there are no testable hypotheses in the three-factor interaction effects. 

The type IV F-values seem to suggest that there are no significant differences in any of the 
main effects and two-way interactions. This seems a bit strange, particularly since a visual 
examination of the data in Table 17.1 shows a large number of negative values for GAIN in 
the group that did not receive food stamps while the values for GAIN in the group that did 
receive food stamps are mainly positive. Thus, one would expect that there would be an 
effect due to food stamps, at least in these two subgroups. Also note that the F-value that 

TABLE 17.2 


Type IV Analysis of Variance Tables 


The GLM Procedure 






Dependent Variable: Gain 





Source 

df 

Sum of Squares 

Mean Square F -Value 

Pr>F 

Model 

14 

1068.546279 

76.324734 

2.67 

0.0026 

Error 

92 

2627.472413 

28.559483 



Corrected total 

106 

3696.018692 





Coefficient of 




Gain 

R-Square 

Variation 


Root MSE 


Mean 

0.289107 

233.3957 

Type IV SS 

5.344107 


2.289720 

Group 

l a 

75.7378078 

75.7378078 

2.65 

0.1068 

Age 

3 a 

41.5257840 

13.8419280 

0.48 

0.6938 

Group x age 

3 

91.5762463 

30.5254154 

1.07 

0.3663 

Race 

2 a 

11.6770165 

5.8385082 

0.20 

0.8155 

Group x race 

2 

113.7034419 

56.8517209 

1.99 

0.1424 

Age x race 

3 

87.3013862 

29.1004621 

1.02 

0.3880 

Group x age x race 

0 

0.0000000 





a Other type IV testable hypotheses exist which may yield different SS. 
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compares the 15 cell means corresponding to the cells that have data is F = 2.67 with 14 and 
92 degrees of freedom. Its corresponding p-value is 0.0026, which also seems to suggest that 
there are significant differences among these 15 treatment combination means. Why then, is 
the test for the group main effect (F = 2.65, p = 0.1068) not significant? The answer, of course, 
may lie in the hypothesis that is actually being tested by the type IV F-value for GROUP. 

Table 17.3 shows the hypotheses being tested by the SAS-GLM Type IV analysis. An 
examination of Table 17.3 reveals that the type IV hypothesis for group compares the food 
stamp group to the no food stamp group averaging over the six cell means that they have 
in common, namely the cells corresponding to (age = 1, white), (age = 2, white), (age = 3, 
black), (age = 3, Hispanic), (age = 3, white), and (age = 4, black). This seems like a very 
reasonable hypothesis as it averages across the maximum number of similar categories 
that one can average over. To further explore why this test is not significant, let us look at 
the standard error of the corresponding type IV contrast that would compare these two 
groups. The estimated standard error is 


s.e. (type TV group contrast) = a III 


i 


f c 1 ' 

L ijk 

V n 


r ^ AA '1 11111 
= 5.344. —+-T—•,+ 

v 2 8 4 1 31 1 

= 12.047 


1 (1 

1 

1 

1 

1 

13 

+ — 

H— 

+ -+ 

-+ 

— 

+- 

) \3 

8 

6 

2 

20 

1 ) 


Notice that the size of the standard error depends more on the small sample sizes within 
the cells than it does on the large sample sizes. Since the type IV contrast for groups 

TABLE 17.3 

Type IV Hypotheses Tested by SAS-GLM 


Source of Variation 

Hypothesis 

Group 

ANlW + t 1 N2W + A^B + ^N3H + A*N3W + ^N4B = 

AblW + A 1 Y2W + A*Y3B + A^Y3H + A*Y3 W + t^Y4B 

Age 

AUib + Abiw = Ab4B + Ab4W' /^n 2 b + A , Y 2 W = A , N4B + /^N4W' and 

A^N3B + A*Y3B + Ab3W = Fn4B + A , Y4B + A , Y4W 

Group x age 

(ANlW _ /^N3W — AblW + A^Y3W ) + (ANlB “ A' N4B “A* Y1B + A%3B ) = 

(A J N2W — A^N3W — Fy2W + Fy3W ) + (AN 3 B — A^N4B — A*Y3B + A J Y4B ) = 9/ an£ l 

A^N3B _ A^N4B _ A^B + A^Y4B = 9 

Race 

A^N2B + t 1 N3B + A , Y1B + A^Y3B + A*Y4B = A I N2W + A , N3W + A 1 Y1W + A J Y3W + Ab4W' 3ni ^ 

A j N3H + A*Y3H = A*N3W + A^Y3W 

Group x race 

A J N3B _ Fn 3W _ A , Y3B + A J Y3W = ^' anC ^ A^N3H “ A%3W ~ A*Y3H + A*Y3W = ® 

Age x race 

AblB — A^YIW — A^Y4B + A^Y3W = A^N2B — A*N2W — A%3B + A^N3W = an d 

A^Y3B — A*Y3W — AArB + A^Y4W = 9 

Group x age x race 

None 
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involves three cells which have only one observation in them, the observed standard error 
must be greater than 


111 

5.344 J- + - + - =9.256 
VI 1 1 

regardless of the numbers of observations in the other cells. This illustrates another aspect 
of data analysis that data analysts must be aware of. Certain tests may not have much 
power associated with them if the corresponding hypotheses involve cells that have small 
sample sizes. This could have been true even if all cells were observed. That is, one may 
have low power even when one has a balanced treatment structure if some of the cell 
sample sizes are small. 

It seems likely that few, if any, of the hypotheses automatically tested by the type IV 
analysis of SAS-GLM will be of particular interest to the experimenter for these data. We 
do not even consider the type I—III hypotheses, since they usually make little sense in cases 
where there are missing cells. 

In this kind of a messy experiment, the safest and easiest way to obtain useful informa¬ 
tion is to look at the three-way least squares means, and pairwise comparisons between 
them. Table 17.4 gives the three-way least squares means and their estimated standard 
errors, and Table 17.5 gives p-values corresponding to pairwise comparisons among the 
least squares means. 

Suppose we wish to test the type IV group hypothesis that p Ym = <u N3W . That is, how do 
the two cell means that have the largest sample sizes compare to one another? These two 
cells correspond to LSMeans 6 and 13 in Table 17.5, and the corresponding p-value that 
compares these two means is p = 0.0004. Thus, there is a highly significant difference due 
to food stamps for the age = 3 and race = white subgroup of mothers. Next, consider age = 2, 


TABLE 17.4 


Three-Way Least Squares Means 


Group 

Age 

Race 

Gain 

LSMean 

Standard 

Error 

Pr > 1 1 1 

LSMean 

Number 

N 

1 

W 

-2.33333333 

3.08542178 

0.4514 

1 

N 

2 

B 

-0.71428571 

2.01988270 

0.7244 

2 

N 

2 

W 

-0.62500000 

1.88942725 

0.7416 

3 

N 

3 

B 

-3.66666667 

2.18172267 

0.0962 

4 

N 

3 

H 

2.50000000 

3.77885451 

0.5099 

5 

N 

3 

W 

-0.20000000 

1.19497872 

0.8674 

6 

N 

4 

B 

0.00000000 

5.34410729 

1.0000 

7 

Y 

1 

B 

4.00000000 

3.77885451 

0.2926 

8 

Y 

1 

W 

0.50000000 

3.77885451 

0.8950 

9 

Y 

2 

W 

4.87500000 

1.88942725 

0.0115 

10 

Y 

3 

B 

7.50000000 

2.67205365 

0.0061 

11 

Y 

3 

H 

0.00000000 

5.34410729 

1.0000 

12 

Y 

3 

W 

5.41935484 

0.95983000 

<0.0001 

13 

Y 

4 

B 

-3.00000000 

5.34410729 

0.5759 

14 

Y 

4 

W 

3.63636364 

1.61130898 

0.0264 

15 
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whites. The corresponding cell means are LSMeans 3 and 10, and the corresponding 
p-value is p = 0.0424, which also indicates a significant difference due to food stamps. 
Finally, consider the groups corresponding to age = 3, blacks. These two means correspond 
to LSMeans 4 and 11, and the corresponding p-value is p = 0.0017. Thus, it seems clear that 
if one only considers cells that have reasonable sample sizes, then we find that there are 
significant differences due to food stamps. 


17.3 A Complete Analysis 

Since we cannot test for three-factor interaction and hence do not know whether there is a 
three-factor interaction, we next examine two-way analyses at each level of a third treat¬ 
ment factor. Let us suppose that the experimenter is most interested in the effects of, or 
differences between the race x group combinations. We thus examine these two-way com¬ 
binations at each level of the age factor, as shown in Table 17.6. 

For age = 1 in Table 17.6, we observe the following: 1) It is not possible to test for 
race x group interaction in this age group, since no contrast exists that measures a two- 
factor interaction; 2) the only type IV hypothesis comparing groups that can be tested is 
Myiw = M N1W ; 3) the only type IV hypothesis in the age = 1 group that concerns races that 
can be tested is p YW = p Ym ', and 4) the hypothesis p Y 1B = Mniw can also be tested in the 
age = 1 group, but this hypothesis is probably of secondary interest since it involves 
different levels in both race and group. From Table 17.5, the p-value for testing p Y] vv = Mniw 
is 0.5628. The p-values corresponding to the hypotheses in 3 and 4 are 0.5141 and 
0.1975, respectively. 

Tables 17.7 and 17.8 give the testable hypotheses, the p-values of their respective test 
statistics, and a (subjective) importance rating for the age groups, age = 2 and age = 4, 
respectively. 

Finally, we examine the age = 3 group. In this group all race x group combinations are 
observed; hence, it is possible to test for race x group interaction. If this interaction is not 


TABLE 17.6 


Observed Race x Age Combinations for Each Age Level for the Data in Table 17.1 



Group 


Race 


Black 

Hispanic 

White 

Age = 1 

No 



X 


Yes 

X 


X 

Age = 2 

No 

X 


X 


Yes 



X 

Age = 3 

No 

X 

X 

X 


Yes 

X 

X 

X 

Age = 4 

No 

X 




Yes 

X 


X 


Note: "X" indicates that the cell was observed at least once in the experiment. 
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TABLE 17.7 

Testable Hypotheses for Age = 2 for Data in Table 17.1 


Hypothesis 

p -Value 

Importance 

Mn2W = Mn2B 

0.974 

Primary 

MY2W = Mn2W 

0.042 

Primary 

MY2W = M\I2B 

0.046 

Secondary 


Note: The p-values are from the Table 17.5. 


TABLE 17.8 

Testable Hypotheses for Age = 4 for Data in Table 17.1 


Hypothesis 

p-Value 

Importance 

MY4B = M\I4B 

0.692 

Primary 

My4BW = A*Y4W 

0.238 

Primary 

MY4W = A*N4B 

0.516 

Secondary 

Note: The p- 

■values are from the Table 17.5. 


significant, we can examine the main-effect means for both race and group in the age = 3 
subgroup of mothers. These three hypotheses are specified by 

Hot Mn3b — Mn3w — A^y3b My3w = 0/ and 

1 1 

A*N3H — ^ (t^N3B + Mn3W) My3H + ^ (My3B + My3w) = 0 

Ho2 : My3- = MN3- 

and 

H 03 : Asb = M.3H = h.3W/ respectively 

The test statistic for H 01 was actually given by the original SAS-GLM analysis shown in 
Table 17.2, while the tests of the other two hypotheses were not. Hypothesis H 01 is equiva¬ 
lent to the one tested by the race x group type IV F-value, which can be seen by examining 
the estimable functions for race x group in Table 17.3. The tests for these three hypotheses 
can be obtained by using the Contrast statements in SAS-GLM, as illustrated below, or by 
hand. In this case, it is probably easier to do the testing by hand. When using SAS-GLM, it 
is also easier to use a Means model. With this model, only the coefficients corresponding to 
the group x age x race effect need to be entered. The SAS-GLM Contrast statements needed 
for an effects model are shown in Table 17.9. 

The test statistics for the above three hypotheses can be obtained from SAS-GLM by 
including the following Contrast statements with the SAS commands given in Section 17.2. 
The required statements are given in Table 17.9. The results from these options are given in 
Table 17.10. 

If the experimenter is also interested in the effects of the race x age combinations, these 
combinations can be analyzed at each value of the group factor. A similar situation exists 
if the experimenter wanted to examine the effects of the group x age combinations for 
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TABLE 17.9 

Contrast Statements for Testing H 01 -H 03 

For H 01 : CONTRAST 'HOI' GROUP 0 0 AGE 0 0 0 0 GROUP* AGE 00000000 

RACE 000 GROUP*RACE 10-1-101 AGE*RACE 000000000000 

GROUP*AGE*RACE 00010-10000-10100, 

GROUP 0 0 AGE 0000 GROUP*AGE 00000000 RACE 000 
GROUP*RACE -.51 -.5 .5-1 .5 AGE*RACE 000000000000 
GROUP*AGE*RACE 0 0 0-.5 1-.5 0 0 0 0 .5-1 .5 0 0; 

For H 02 : CONTRAST 'H02' GROUP 3 -3 AGE 0 0 0 0 GROUP*AGE 003000 -3 0 

GROUP*RACE 111-1-1-1 AGE*RACE 000000000 
GROUP*AGE*RACE 0001110000-1-1-100; 

For H 03 : CONTRAST 'H03' GROUP 0 0 AGE 0 0 0 0 GROUP* AGE 00000000 

RACE 1-10 GROUP*RACE .5 -.5 0 .5 -.5 0 AGE*RACE 00001-1000 
GROUP*AGE*RACE 000 .5-.500000 .5-.5000, 

GROUP 0 0 AGE 0000 GROUP*AGE 00000000 RACE 10-1 
GROUP*RACE .5 0 -.5 .5 0 -.5 AGE*RACE 000010-100 
GROUP*AGE*RACE 000 .50-.50000 .50-.500; 


TABLE 17.10 

Results from the SAS Contrast Statements 
The GLM Procedure 
Dependent Variable: Gain 


Contrast 

df 

Contrast SS 

Mean Square F -Value 

Pr>F 

Hoi 

2 

113.7034419 

56.8517209 

1.99 

0.1424 

^02 

1 

102.1000978 

102.1000978 

3.57 

0.0618 

^03 

2 

7.8054511 

3.9027256 

0.14 

0.8724 


each race. Both of these analyses can be done either by hand, by using Contrast statements, 
or by using three-way least squares means when possible, as illustrated at the beginning 
of this section for the different levels of the age factor. 

The analysis of higher-order cross-classified treatment structures can be carried out in 
ways similar to those illustrated in this chapter. 


17.4 Concluding Remarks 

This chapter presented the analysis of a three-way treatment structure having a large 
number of missing treatment combinations. An SAS-GLM analysis was obtained and 
interpreted. Questions not answered by the SAS-GLM analysis were also raised, and 
techniques for answering these questions were illustrated. 
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17.5 Exercises 

17.1 The following data was collected from a three-way treatment structure in a 
completely randomized design structure. Use a means model to answer the 
following questions. 


Level 1 of C 

B1 

B2 

B3 

A1 

21,24 

23, 23, 27 

35 

A2 


18,16 

28,23,25 

A3 

37,37 


34 

Level 2 of C 

B1 

B2 

B3 

At 

32 


18,21 

A2 

27, 29, 26 

35 


A3 

37,32 

34, 36,42 

16,13,15 

Level 3 of C 

B1 

B2 

B3 

A1 

26, 22 

23, 25, 20 

22 

A2 

30,33 


40 

A3 


27, 30, 33 

36,38 


1) Use the means model to provide a complete analysis of the data. 

2) Determine all of the type IV hypotheses for A. 

3) Determine all of the type IV hypotheses for AxB. 

4) Determine a set of linearly independent contrasts that measure three-way 
interaction. 

17.2 The data for Exercise 17.1 was collected from a three-way treatment structure in 
a completely randomized design structure. 

1) Use the effects model to provide a complete analysis of the data. 

2) Compare the analyses from the effects model with that of the means model. 

17.3 The following data are from a four-way treatment structure in a completely 
randomized design structure. 


Data for Exercise 17.3 



O 

II 

CQ 

o 

II 

A = 0 B = 1 

A = 1 B = 0 

A = 1 B 

= 1 

D = 0 

D = 1 

D = 0 

D = 1 

D = 0 

D = 1 

D = 0 

D = 1 

n 

ii 

o 

7,8 


9 

12,11 

12,10 

11,16 

13 

18,15 

C = 1 


10 


11 




16,18 

C = 2 

9 

10,13 


14 

12,14 

14,17 

13,15,16 


C = 3 

10,11 


13 


15 


18,17 
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1) Determine the numbers of degrees of freedom for each two-way, three-way 
and four-way interactions. 

2) Use a means model to carry out an analysis of this data by determining the 
important factors as well as making the necessary comparisons. Be sure to 
use multiple comparisons when needed. 

3) Use a effects model to carry out an analysis of this data by determining the 
important factors as well as making the necessary comparisons. Be sure to 
use multiple comparisons when needed. 




Random Effects Models and Variance Components 


Models with more than one random component are applied to several situations, including 
random effects and mixed effects models where some or all of the factors in the treatment 
structure are random or where there are several sizes of experimental units as in split-plot 
and repeated measures designs. The parameters of interest for such models include the 
variances associated with the distributions of the random components (usually called vari¬ 
ance components). It is important to be able to identify the random components of a model 
and be able to utilize them in the analysis of the model. When carrying out an analysis of 
variance for a given model, the expected values of the mean squares (which are functions of 
the variance components) are needed in order to construct proper test statistics and deter¬ 
mine standard errors for comparisons of fixed effect parameters. It is also important to be 
able to obtain estimates of the variance components and test hypotheses and construct con¬ 
fidence intervals about functions of the variance components. The discussion of random 
effects models and methods of analyzing them is divided into four chapters. This chapter 
defines the random effects model and describes a general procedure for computing expecta¬ 
tions of sums of squares. The procedure can easily be used by computer software to evalu¬ 
ate the expectations of sums of squares. The problem of estimation is discussed in Chapter 19, 
methods for testing hypotheses and constructing confidence intervals are presented in 
Chapter 20, and a detailed analysis of an example is presented in Chapter 21. 


18.1 Introduction 

The philosophy behind the use of random effects models is quite different from that behind 
the use of the fixed-effects models (discussed in the previous chapters) in both the sam¬ 
pling scheme and the parameters of interest. Before these differences are discussed, the 
definitions of a random effect and a fixed effect are given. 

Definition 18.1: A factor is a random effect factor if its levels consist of a random sample 
of levels from a population of possible levels. 

Definition 18.2: A factor is a fixed effect factor if its levels are selected by a nonrandom 
process or if its levels consist of the entire population of possible levels. 
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Thus, in order to determine whether a factor is a fixed effect or a random effect, one 
needs to know how the experimenter selected the levels of that factor. If all possible levels 
of the factor or a set of selected levels of the factor are included in the experiment, the 
factor is considered as a fixed effect. If some form of randomization is used to select the 
levels included in the experiment, then the factor is a random effect. 

Rule: The levels of a factor are fixed until proven random. 

To establish that the levels of a factor are random, the population or conceptual popula¬ 
tion of possible levels must be described and the method of randomly selecting the levels 
of the factor must be specified. Inferences are to be made to the population of levels, so if 
that population is not describable, the inferences may not be meaningful. 

For example, suppose a plant breeder wants to study a characteristic (say, yield) of wheat 
varieties. There are many possible wheat varieties (a population of varieties), but if he wants 
to study a certain set of varieties, then he would select just those varieties for his experi¬ 
ment. In this case, the factor "variety" is called a fixed effect, since the levels of varieties are 
chosen or fixed. However, if the plant breeder is interested in how a characteristic is distri¬ 
buted among the varieties in the population, then he is not interested in which set of variet¬ 
ies is included in the experiment. In this case, the plant breeder can randomly select the 
varieties to be included in the experiment from the population of varieties. Therefore the 
factor variety in this second experiment is a random effect. When constructing a model to 
describe a given experimental situation, it must be stated whether a factor is a random or 
fixed effect. The models considered in the previous chapters were constructed under the 
assumption that all factors in the treatment structure were fixed effects and there was one 
size of experimental unit. However, the idea of a random effect was alluded to when block¬ 
ing was introduced in Chapter 4, where it was assumed that the factor "blocks" was a 
random effect. Further, a basic assumption is that those factors associated with the design 
structure of a model are random effects. Some factors can have both a set of levels that are 
fixed effects and a set that are random effects (Njuho and Milliken, 2005), but that topic is 
not elaborated on here. Three basic types of models can be constructed, depending on the 
number of sizes of experimental units in the design structure and assumptions about the 
factors in the treatment structure. These types of models are defined below. 

Definition 18.3: A model is called a fixed or fixed effects model if all of the factors in the 
treatment structure are fixed effects and there is only one size of experimental unit in the 
study, no blocking, and the variances are all equal. 

Definition 18.4: A model is called a random or random effects model if all of the factors 
in the treatment structure are random effects (all of the factors in the design structure are 
already assumed to be random effects). 

Definition 18.5: A model is called a mixed or mixed effects model if some of the factors in 
the treatment structure are fixed effects and some are random effects, or if all of the factors 
in the treatment structure are fixed effects and there is more than one size of experimental 
unit in the design structure (or there are at least two variance components in the model). 

The models discussed in this chapter are all random effects models. The discussion of 
mixed models is presented in Chapters 22 and 23. The following example is presented to 
help motivate the application and analysis of random effects models. 
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18.1.1 Example 18.1: Random Effects Nested Treatment Structure 

A consumer group studied the variation in coffee prices in U.S. cities with populations of 
at least 20,000. Three factors that the group wished to investigate were states, cities within 
states, and stores within cities within states. The treatment structure or sampling design is 
a three-way, two-level, nested system involving states, cities, and stores, where cities are 
nested within states and stores are nested within cities. The sampling procedure was to 
select r states at random (r < 50) from the population of all possible states. Next, randomly 
select f cities from the C, cities in the 7th state (f, < C,) with populations of at least 20,000. 
Finally, randomly select n i( stores («„■ < S,j) from the S, ( stores in the /1h city in the ith state 
and determine the price of a particular grade of coffee at each randomly selected store. 
A model that can be used to describe the variability in the coffee prices is 

Vijk — ^ “t S] Cj^ + a /.( ; y) i 1,2,... ,r, / 1 , 2 ,..., t-, k 1 , 2 ,..., n^ 

where p denotes the average price of coffee in the United States, s, denotes the effect of the 
ith randomly selected state, c j(i) denotes the effect of the /th randomly selected city from the 
ith state, and a k(ij) denotes the effect of the /cth randomly selected store from the /th city in 
the ith state. The assumptions are that 

1) The s, are distributed i.i.d. N( 0, o| tate ). 

2) The Cy (i) are distributed i.i.d. N( 0, (T^. ity ). 

3) The a k(ij) are distributed i.i.d. N( 0, cr| tore ). 

It is also assumed that all of the random effects are distributed independently of one 
another. The parameters in this random effects model are p, (7g tate , of :Uy , and c| tore . The 
terms s„ c ((l) , and a k(lj) are random variables and are not parameters in the model. Most 
applications are only interested in the estimation of the parameters and are not interested 
in predicting the values of the random variables. But, when predictions of the random 
variables are of interest, estimated best linear unbiased predictors (EBLUP) of the random 
effects can be obtained (Littell et al., 2006; Milliken and Johnson, 2002). 

The variance of a coffee price can be expressed through the parameters of the model as 


Var (y # ) =(7 2 rice = <7g tale + c> 2 aty + cr s 2 tore 


The covariance between two coffee prices from stores within the same city (and state) is 


COV/I/,.,, y^ 2 ) Estate + ( U ily 


The correlation between these two prices is 


PyijiVin 


a 


2 

State 


+ C7 


2 

City 


G 


2 

State 


+ ^c lly 


+ CT 


2 

Store 
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The covariance between two coffee prices from stores from different cities within the state 
is Cov(y /n , y m ) = <7 y . y = cr| tate . The correlation between two prices from two different 
cities within the same state is 


P 


ViwVin 


a 


2 

State 


G 


2 

State 


+ G. 


City 


+ (7 


2 

Store 


Thus, when a model includes random effects, a correlation structure similar to that for 
the coffee prices is imposed on the resulting data. It is of interest to be able to construct 
models and then identify the sources of variation as well as the correlations among the 
various groupings of observations. The following examples are used to demonstrate the 
construction of models with one or more random effects and to evaluate the resulting 
implied covariance or correlation structure among the observations in the data set. 


18.2 General Random Effects Model in Matrix Notation 

In order to describe methods used to evaluate the expectations of sums of squares, it is 
necessary to have some general notation to describe the general random effects model. 
This section presents a matrix representation of a general random effects model, which is 
used in later sections to demonstrate general methods for computing expected mean 
squares. To help visualize the general random effects model and its expression in terms 
of matrices, a random effects model for a one-way treatment structure in a completely 
randomized design structure is examined. 

18.2.1 Example 18.2: One-Way Random Effects Model 

A model describing a one-way random effects treatment structure in a completely ran¬ 
domized design structure is 

y ij = p + u i + e ij , z = 1,2,...,f, /' = 1,2,...,«, (18.1) 

where p is the population mean of the response, u t denotes the effect of /th randomly 
selected treatment and is assumed to be distributed i.i.d. N( 0, of), and £,, denotes the 
random error of the /th observation of the zth treatment where it is assumed the £„ are 
distributed i.i.d. N( 0, of). It is also assumed that h, and £ I( are independent random vari¬ 
ables. These assumptions allow the variances and covariances of the observations to be 
evaluated. The variance of an observation is 

Var(y, y ) = a] = Var (p + u, + £ ;/ )=Var (u ,) + Var(£, ; ) 


There are two components in the variance of y, y which include the variance of the popula¬ 
tion of treatments or levels of u and the variance of the experimental units, hence the name 
variance components or components of variance. The covariance of two observations 
obtained from the same possibility for u j is 

c ov(y,;,y,j') = Cov(p + ip + £.., p + Uj + £, f ) 

= Co v(Uj, Uj) = Var(w,) = o; 
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The covariance between two observations obtained from different i values is zero. Hence, 
observations obtained from the same i are correlated and this correlation is called the 
intraclass correlation and is defined as 


Cov(y iy ,y r ) _ al 

V Var (y, y ) Var (y, r ) 

Observations from different i values are uncorrelated. The model in Equation 18.1 can be 
described in matrix notation as 


y =jp + Z l n + e (18.2) 

where j is an Nx 1 vector of ones (N = X', nf Z, is an Nxt design matrix, u is the t x 1 
vector random variable assumed to be distributed as the multivariate normal distribution 
N t ( 0, (jjfi), and e is the N x 1 vector random variable assumed to be distributed Nf O, a; I N ). 
The covariance matrix of the vector of observations, y, is 

E = Var (y) = Var (j p + Zpi + e) 

= Zj Va v(u)Z [ + Var(e) 

= GlZ 1 Z[ + alI N 

The variances of the y 's are the diagonal elements of E and the covariances between pairs 
of i/, ( are off-diagonal elements of E. 

Equivalently, the model can be written as 


’ Vn ’ 


T 


'1 

0 

... o ' 



£ u 

Vn 


1 


1 

0 

... 0 



s 12 

Vm, 


1 


1 

0 

... 0 




Vn 


1 


0 

1 

... 0 

. 


£ 2 i 








U 1 





1 


0 

1 

... 0 



£ 22 








u 2 



y 2 „ 2 


1 

p + 

0 

1 

... 0 

_“(_ 

+ 

£ 2h 

Vn 


1 


0 

0 

... 1 



£(i 

3/(2 


1 


0 

0 

... 1 



£ t2 

_y„, _ 


1 


0 

0 

... 1 



- £ ‘», _ 


The general random effects model will have r random components representing the main 
effects and interactions for the random effect factors of the treatment structure and for 
those factors used to describe the design structure as well as possible interactions between 
components of the design and treatment structures used to describe the necessary error 
terms. The general random effects model includes an overall mean parameter denoted by p 
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as well as e, the vector representing the residual or smallest size of experimental unit errors. 
The general random effects model can be expressed in matrix notation as 

y=jp + Z 1 u 1 + Z 2 u 2 + ■ ■ ■ + Z,u r + e (18.3) 


where the u s , s = 1, 2,r denote random effects and e denotes the residual error and 
where all of these random variables are assumed to be distributed independently with 
assumed marginal distributions 


Ml ~ N( 0, a\l,), u 2 ~ N( 0, a\l,f u, ~ N( 0, a%), and e- N( 0, (Tj 7 n ) 


N is the total number of observations in the data vector, y, and the Z, are the N x t, design 
matrices corresponding to the zth random effect vector. This general random effects model 
can be expressed as 


y = j N p + Zu + £ where Z = [Z l ,Z 2 , ...,Z,], 



"Mj 1 



0 

0 

0 


M, 


0 

a^I, 

0 

0 

u = 


and Var(w) = 

0 

2 t 2 

0 


0 


M, 


0 

0 

0 

o;i tr _ 


Consequently, 


Var(y) = Z' Var(«)Z + o 2 e I N 
The covariance matrix of the data vector y is 

£ = Var (y) = Var (jp + Z, m, + Z 2 u 2 + + Z r u r + e) 

= Z l Var(w 1 )Z, + Z 2 Vav(u 2 )Z 2 + —I- Z r Var(u r )Z' r + Var(£) 

= (TjZjZj + o 2 Z 2 Z 2 + ■■■ + (7 Z Z, + o e I n 

This general form of the random effects model can be used in many situations to help 
identify the sources of variation in the data collection system. One method of looking at 
these sources of variation is to compute a set of sums of squares corresponding to each 
random effect and then determine the functions of the parameters each sum of square esti¬ 
mates. The functions of the variance components being estimated by each effect sum of 
squares can be evaluated by obtaining the expected values of the effects sums of squares. 
In the next section, this matrix form of the general random effects model is used to describe 
the method for evaluating the expectations of sums of squares involving the observations. 


18.3 Computing Expected Mean Squares 

The expected values of the sums of squares from an analysis of variance of a random 
effects model involve the variance components. For a given model, at least two methods 
can be used to evaluate the expected mean squares (remember that a mean square is a sum 
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of squares divided by its degrees of freedom). The first method is to algebraically evaluate 
the expected values by using the model assumptions and the second method is to evaluate 
the expected values by means of a computer algorithm. The algebraic method is presented 
by applying it to the sum of squares obtained from the analysis of a model with a one-way 
random effects treatment structure. The computer algorithm method is discussed in 
general terms and demonstrated by more complex examples. 

18.3.1 Algebraic Method 

There are two variance components in the one-way random effects model in Equation 18.1, 
thus two sums of squares are used in the analysis to describe the variability in the response. 
The two sums of squares that are usually computed are the sum of squares within levels of 
the random effect, designated Q,, and the sum of squares between the levels of the random 
effect, designated Q 2 . For the one-way random effects model, these sums of squares are 
given by 

Qi = XX(i/„ - y,f = XX>/,y - Yj n . l A 

1=1 ;=1 i= 1 ;=1 2=1 

and 


Qi = 

2=1 


y..f = Ys'hyl - n -. y . 2 . 

;=i 


In terms of the random variables of model (18.1), the quantities in Q, and Q 2 are expressed as 


y ij = p + u i + £ tj 

Vi. = h + M, + £,. and 
y„ = p + u. + £.. where it. = 


(18.4) 


Substituting the terms from Equation 18.4 into Q lr the expression becomes 


t n { 


Qi = 'Z'Zkr + u , + £ ,j) ~(m + u, + £,.)f = XX ( £ <; - e i-f 

2=1 ;=1 2=1 j-1 

The expectation of Q, can be evaluated using properties of the distribution of £ as 


m i) = XX E ( £ , - £ if = XX^y) + m*) - (by squaring) 

i=l ;=1 1=1 7=1 


' n,f z al 


XX 

i =1 1=1 


cr; + — - 2- 

n , n 


using E(£ 2 ) = (T 2 and E(g*) = 


— g 

n. 


i =1 7=1 12; i=l 
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Using this expression for the expectation for the sum of squares, the expected mean 
square is 


E(mean square of QJ = £ 



= a 


2 

£ 


Substituting the expressions in Equation 18.4 into the equation defining Q 2 provides 

t 

Qi = + «/ + e n~ V- ~ “■ + ^-) 2 

i =1 

= Y j n 1 {{u-u,) + (£ tj -£ t ,)f 
2=1 


The expectation of Q 2 is 

EiQ^infEiu.-u f+m. ) 2 ] 

2=1 

= X n ii E ( u ,) 2 + E(m. f-2E(u i u ) + E(e,.f + E(e ,.) 2 - 2E(e u e..)] 
2=1 


To simplify this, the expectation of Q 2 is evaluated in two parts. The first part evaluates the 
expectation involving the e. The distributions associated with the means £,. and e u . are 



and 
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To evaluate the part of the expectation of Q 2 involving the m„ let «. = X' , (n,u, /N). Since the 
u, are independent with mean equal to zero. 


Var(fi.) = E(ii .) 2 


f „2 x 


W 



a 


2 

u 


The covariance between u and u is 


Co v(u.,Uj) = E(u. Ui ) = -f E( u,) = - 7 \a~ 


N 


n : 


N 


since E(uju r ) = 0 for i ^ i'. Putting these pieces together, the part of the expectation of Q 2 
involving the », is 

Xn i [E(M i ) 2 +E(M) 2 -2E( Mi M.)] 


‘ n ; o 2 f 1 




i=1 
( . 


vP 


y f , 

4 - 0=1 


-o 1 


\ 

r 

p 1 2't 



. n,- 

— 

N-- 




N 

/ 

\ 

y 


N 

Finally, putting the two parts together, the expectation of Q 2 is 

if 9 \ 


E(Q 2 ) = (t - 1 )ct; + 

and the expected mean square for Q 2 is 


N - 


V nf 

i=i ! 


N 


" q 2 ^ 

v f - 


= ay + 


V f “ P 


N- 


I'i 


1 9 A 


N 


An analysis of variance table with sources, degrees of freedom, sums of squares and 
expected means squares for the one-way random effects model is given in Table 18.1. 


18.3.2 Computing Using Hartley's Method of Synthesis 

Hartley (1967) described a technique for evaluating expectations of mean squares for a 
specific design and sum of squares that he called the method of synthesis. In order to 
describe and later apply the technique, the expectation of a sum of squares computed from 
the general random effects model (18.3) is expressed in matrix notation. Any sum of squares 
that is computed using an equation or is extracted via computer software can always be 
represented as a quadratic form in the data as 


Q = y' A v 


( 18 . 5 ) 
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TABLE 18.1 


Analysis of Variance Table for One-Way Random Effects Model 


Source 

df 

Sum of Squares 

Expected Mean Square 


Between treatments 

t -1 

Q2=i>,(y,. - F..) 2 
(=1 

. Z w—i I n \ 


or populations 


1 nA-,"' 

f U-lJ N 

v / v V 

at 

Within error 

N-t 

1 

-H3 

II 

a 

o 2 £ 



where y is the vector of observations and A is an appropriately chosen symmetric matrix 
of constants called the matrix of the quadratic form (Graybill, 1976). As an example, the 
sample variance of a vector of n observations is 


s 2 = t {y, ~ y / =y 

i=i 


n - 1 


——(i ~ ~j 
n - 1 v n 


y 


where I n is an nxn identity matrix, /„ is an ri x n matrix of ones, and the matrix of the 
quadratic form y'Ay is 


A = 


n - 1 


I - - -Jr 

n 


For different models, certain choices of A always exist that yield the desired sums of squares, 
but fortunately, as will be seen shortly, it is not necessary to know the elements in A, nor even 
to know how to determine the elements of matrix A. You only need to know that A exists. 

For the general random effects model (18.3) and its corresponding covariance matrix E, 
the expectation of a quadratic form (Graybill, 1976) is 

E(y'Ay) = Tr [AS] + \p%Aj n (18.6) 

where Tr[B] = £" =] b u and b u , i = 1,2,...,«, denote the diagonal elements of the square 
matrix B. The sums of squares in the analysis of variance are constructed such that p 2 j'Aj n = 0. 
Thus the expectations of the sums of squares do not depend on p 2 j' t Aj n and are given by 

E{y'Ay) = Tr[A2] (18.7) 

The covariance matrix of the general random effects model in Equation 18.3 is 

E = o\ZfZ x + <7 2 Z 2 Z 2 + • • • + c 1 r ZfZ r + g 2 e I n 

Thus, the expectation of the quadratic form y'Ay is 


E(y'Ay) = Tr [AE] = Tr [A{o\ ZfZ\ + oyZ 2 Z( + • • • + a 2 Z r Z’ r + a 2 £ I N )\ 

= o\ Tr[AZ,Z(] + o\ Tr[AZ,Z(] +■■■ +a 2 Tr[AZ,Z(] + a 2 e Tr [A] 
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Therefore, the coefficient of of is Tr[A] and this is equal to the number of degrees of freedom 
associated with the sum of squares y'Ay. The coefficient of of is Tr[AZ,Z'] for s = 1,2,. .. ,r. 

One property of the trace operator is that Tr[AZ s Z'] = Tr[Z'AZJ where Tr[Z'AZJ is 
the sum of the diagonal elements of Z'AZ, or Tr[Z'AZ,] = XT, z' ; Az s; - since there are t s 
columns in Z s . But z' ; Az s y is the same sum of squares as y'Ay except that the column vector 
z S j is used as data in place of y. Hence, if you have an equation or a computer program that 
calculates y'Ay it can also be used to calculate z' sj Az sj . Thus, the coefficient of cr 2 in the 
expectation of y'Ay is the sum z' sl Az sl + z' 2 Az s2 + ■ ■ ■ + z' ts Az s(s . 

If the elements of A are known, then the above sums of squares can be evaluated explic¬ 
itly. If A is not known, which is likely since a computer code is probably used to calculate 
y'Ay, each z' S jAz sj can be computed by having the computer calculate the sum of squares 
where the column z s/ is used as the data vector (instead of y). Thus, the sum of squares 
must be computed for each column of the matrix of design matrices from all of the random 
effects in the model, [Zj,Z 2 ,... ,Z,] as if it were data, and then the expectation of the sum of 
squares y'Ay is evaluated as 



f '• 'l 


( h 


ft, \ 

E(y'Ay) = vo] + 

IX Az ., 

U =l J 

erf + 

IX AZ 2, 

U =1 J 

a; +■■■ + 

IX AZ r, 
U =1 / 


where v is number of the degrees of freedom associated with y'Ay. 

To help demonstrate the idea of synthesis, the expectations of the sums of squares, Q, 
and Q 2 are recomputed for the one-way random effects model. First, a specific model with 
t = 4 and n, = 4 for each i is used to show how to compute the expectations and then the 
expectations are computed for the general one-way random effects model. 

The matrix form for a model to describe the yield of four randomly selected varieties of 
wheat in a completely randomized design structure with four replications is 


Vn 

Vl2 

3/l3 

Via. 

3/21 

3/22 

3/23 

3/24 

3/31 

3/32 

3/33 

3/34 

3/42 
3/43 
_Vu . 

or y=jv,y + [zi z 2 z 3 z 4 ]m + e. 


1 


1 

0 

0 

0 


£11 

1 


1 

0 

0 

0 


£j 2 

1 


1 

0 

0 

0 


£ 13 

1 


1 

0 

0 

0 


£ 14 

1 


0 

1 

0 

0 


£ 2 i 

1 


0 

1 

0 

0 


£22 

1 


0 

1 

0 

0 


~!<l' 


£ 23 

1 

p + 

0 

1 

0 

0 

+ 

«2 

+ 

£24 

1 

0 

0 

1 

0 


«3 


£ 31 

1 


0 

0 

1 

0 


_ w 4_ 


£ 32 

1 


0 

0 

1 

0 


£ 33 

1 


0 

0 

1 

0 


£ 34 

1 


0 

0 

0 

1 


£41 

1 


0 

0 

0 

1 


£42 

1 


0 

0 

0 

1 


£ 43 

1 


0 

0 

0 

1 


£44 _ 
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The expectation of Q„ the within sum of squares, is 


E(Qi) = Uy'A w y) 



7=1 


+ 12cr; 


where 12 is the degrees of freedom associated with Q, and A w denotes the matrix of the 
quadratic form for the within sum of squares Q v To obtain the coefficient of of compute the 
within sum of squares using z 1 as data, compute Q, using z 2 as data, compute Q, by using z 3 
as data and compute Q, using z 4 as data. The within sum of squares for column z, 

Qi(Zi) = XX4 “ 4 S Z »- = 4 - 4 (!) = 0 

1=1 j =1 i=l 

Likewise, the values of Q 4 (z 2 ), Qfz 3 ), and Qj(z 4 ), are also zero, which implies that the 
coefficient of cr 2 in £(Q,) is zero, thus, E(Q 1 ) = 12cr. 2 . 

The expectation of Q 2 , the between sum of squares, is 


E{Q 2 ) = E{y'A B y) = ol 




7=1 


+ 3cr 2 


where 


t t 

Qi = 'Z'hiFi. -y..) 2 =5>,yf. -«..y. 2 . 

i =1 i=1 

and 3 is the number of degrees of freedom associated with Q 2 ; A B denotes the matrix of the 
quadratic form for the between sum of squares Q 2 . To compute the coefficient of d 2 , com¬ 
pute the between sum of squares or Q 2 using each column z u z 2 , z 3 and z 4 and add these 
four values together. The value of the between sum of squares computed by using z, as the 
data is 


Q 2 (Zi)=4£z 1 2 i .-16z. 2 .=4(l 2 +0 2 +0 2 +0 2 )-16(0.25) 2 =3 

1=1 

The values of Q 2 (z 2 ), Q 2 (z 3 ), and Q 2 (z 4 ), are also equal to 3; thus, the coefficient of cr 2 
3 + 3 + 3 + 3 = 12. Using these values, the expression for E(Q 2 ) is 


E(Q 2 ) = 12ct 2 + 3c7 2 
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Next, Hartley's method of synthesis is used to compute the expectations of the between 
and within sums of squares for the general one-way random effects model in Equation 
18.1. The matrix form of the model is 



or 

y=j„.R + [Zi,Z 2 ,...,Z t ]u + E 
The within or error sum of squares is 

Qi=X X y\ ~ X n >yl = y' A «’V 
1=1 ;■=1 1=1 

and its expectation is 

E(Q 1 ) = alj j z'A u ,z l +(N-t)a; 

i =1 

where N = X ' =1 ih and A w is the matrix of the quadratic form for the within sum of squares. 
The value of Qi when using the first column, z v as data is 

Qi(*i) = = n i - KW + « 2 (0) + ••• + « t ( 0 )] = o 

1=1 ;=1 1=1 

Likewise, the values of Q 2 (z 2 )/ Q 2 ( z 3)/ •••, Q 2 (z f ), are also zero; thus, the coefficient of in 
E(Q|) is zero, implying that E(Q,) = (N - t)a; and where there are (N - t) degrees of free¬ 
dom associated with Q 2 . 
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The between sum of squares is 


Q, = I«,y?. -«..y. 2 . = y' A „y 

i =1 


and its expectation is 


E(Qz)= <T 2 X z ' A b z ; + V-tyl 

i =i 

where A B is the matrix of the quadratic form for the between sum of squares. The value of 
Q 2 using the first column, z u as data is 


Q 2 (Zi)=X n i z H.- Nz i.. 

i =1 


For column z v z u . = 1 , z 12 . = z 13 . = • • • z lt . = 0 and z h . = n,//V. Thus, 


Q 2 ( Zl ) = nfIf + n z (0) 2 +■■■+ n,(0) 2 ~ N^/N) 2 = - 


N 


Similarly, the values of Q 2 using the other columns of Z as data are 


n 7 -nl n*-nl n,-n 2 

Q 2 (z 2 ) = ^ w 1 , Q2(z 3 ) = ^ w ^,.-,Q 2 (z t ) = ^ w J - 


Combining these results, the expectation of the between sum of squares is 


E(Q 2 ) = (t- 1 )ct e 2 + 




v 2 u =(t~ l )^ 2 + 


N- 


N 


The expectations of Q, and Q 2 obtained via Hartley's method of synthesis are equivalent 
to those obtained using the algebraic technique. 

Next, Hartley's method of synthesis is used to evaluate the expectation of sums of squares 
for a model used to describe a two-way random effects treatment structure with interaction 
in a completely randomized design structure. In this case, both the row treatments and 
column treatments are random effects. Data from this experiment can be modeled by 

J Ji jk = y + fl; + bj + Cjj + %, i = 1,2,...,s, j = 1,2,...,t, k= 1,2,...,no¬ 
where the a„ i = 1,2,...,s, denote the random effects corresponding to the rows with 
distributions that are i.i.d. N( 0, (J 2 ); the bj, j = 1, 2, ...,t, denote the random effects corres¬ 
ponding to the columns with distributions that are i.i.d. N( 0, af); the c, ; - denote the random 
effects of the row-column combinations with distributions that are i.i.d. N( 0, (J 2 ), and the e ijk 
denote the experimental unit errors with distributions that are i.i.d. N( 0, cr|). The schematic 
in Figure 18.1 represents data from an unbalanced two-way treatment structure in a 
completely randomized design structure where there are two or three observations per cell. 
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1 

Column treatments 

2 

3 

Row 

i 

^111^112^113 

yi2iyi22 

ymyi32 

treatments 

2 

V211 ^212 

^2217222^223 

JV231JV232 


FIGURE 18.1 Example of two-way random effects treatment structure. 



or y=ju p + Zj« + Z,b + Z 3 c + e where the parameters of the model are p, of of and c\. 

Sums of squares for this unbalanced two-way random effects model can be computed in 
several different ways (see Chapters 9 and 10). In the analysis for this model, there are four 
variance components and thus one needs four different sums of squares. To demonstrate 
the method of synthesis, four sums of squares have been selected that correspond to the 
balanced case of SSROWS, SSCOLUMNS, SSINTERACTION, and SSERROR but have been 
modified for the unequal sample sizes (which correspond to Henderson's type I sums of 
squares, discussed later in this section). The four sums of squares are 

Qi = £— (SSROWS) 

P n r n - 

1 t / 2 l / 2 

Q, = V - — (SSCOLUMNS) 

P n.j n.. 

Qs = H— ~ Qi - Q 2 + — (SSINTERACTION) 

Pip n ij n.. 


Q< = ±H(y iik 


i =1 /=! k =1 


y tj .f (SSERROR) 


and 
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For the data structure in Figure 18.1, 

2 2 2 2 

5 5 4 14 

and E(Q 2 ) has the form E(Q 2 ) = T, of + k 2 o x + k 3 a x + 2crf for some values of /c v k 2 , and k 3 where 
2 is the number of degrees of freedom associated with Q 2 . The next step is to use Hartley's 
method of synthesis to determine k v k 2 , and k 3 . To determine the value of k u compute Q 2 for 
each of the two columns of Z,. The value of Q 2 using the first column of Z x as data is 


Q 2 ( Z 11) 


3 2 2 2 2 2 7 

- 1 - 1 - 

5 5 4 14 


and the value of Q 2 using the second column of Z, as data is 


Q 2 ( z 12) 



5^ 

14 


= 0.1 


Thus, /c, = Q 2 (z n ) + Q 2 (z 12 ) = 0.1 + 0.1 = 0.2. To determine the value of k 2 , compute Q 2 for each 
column of Z 2 as 


0 2 0 2 


Q 2 ( Z 2l) I- I- d" 


0 


4 

0 2 


14 


= 3.214 


Q 2 (z 22 ) =—+—+— - — = 3.214 
^ 2V 22 ’ 5 5 4 14 


and 


n , . 0 2 0 ‘ 

Q,(- 2 s) ~ _ + f- '* 


14 


= 2.857 


The value of k 2 is k 2 = Q 2 (z 2;1 ) + Q 2 (z 22 ) + Q 2 (z 23 ) = 3.214 + 3.214 + 2.857 = 9.285. 

The value of k 3 is obtained by computing Q 2 for each column of Z 3 . The values of Q 2 are 


0 2 0 2 


Q 2 ( Z 3u) r r ~ l " 


14 

2 2 


= 1.157 


0 2 0 

Q 2 (z, 12 ) = — + — +-— = 0.514 

2V 312; 5 5 4 14 

0 2 0 2 2 2 2 2 

Q 2 ( z 3i3) = ^+^ + ^-TT = 0- 7 l 4 


0 2 


Ql( Z 32l) — r + r ~ l " 


Q 2 ( Z 322) 


o 2 


- + 


4 

0 ^ 

4 

0 2 


14 

2 2 

14 


= 0.514 


14 


= 1.157 
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and 


Q2 (^323) 



0.714 


Thus, the value of k 3 is computed as k 3 = 1.157 + 0.514 + 0.714 + 0.514 + 1.157 + 0.714 = 4.770. 
Using the above values for k u k 2 , and k 3 , the expectation of the SSROWS is 

E(Q 2 ) = 0.200a,? + 9.285(7? + 4.770(7? + 2(7? 

Similarly, applying Hartley's method of synthesis, the expectations of Q, and Q 3 are 
determined to be 


E(Qf = 7.000(7? + 0.143(7? + 2.429(7? + (l)c7? 


and 


E(Q 3 ) = 4.371(7? + 2(7? 

In general, the expectation of the SSERROR is equal to the number of degrees of freedom 
associated with SSERROR times (7?, which in this case provides 

E(Qi) = E (SSERROR) = 8(7? 

There are various ways to compute sums of squares for unbalanced treatment struc¬ 
tures, including the type I-IV sums of squares of SAS®-GLM, and type I—III from SAS- 
MIXED, as well as four methods due to Henderson. The computations of the expectations 
of the type I—III from SAS-GLM are shown below as well as a demonstration of how to use 
SAS to provide the computations of the expectations of sums of squares. Table 18.2 con¬ 
tains the SAS code to generate the columns of Z, (denoted by a, and a 2 ), Z 2 (denoted by b u 
b 2 , and b 3 ), and Z 3 (denoted by c u , c 12 , c 13 , c 21 , c 22 , and c 23 ) and compute the three types of 


TABLE 18.2 

SAS Code to Evaluate the Type I—III Sums of Squares for Each Column of [Zl, Z2, Z3] and y for 
Data Structure in Figure 18.1 and Data in Table 19.2 

data ex_18; input row col y @@; 

al=(row=l); a2=(row=2); bl=(col=l); b2=(col=2); b3=(col = 3); 
cll=al*bl;cl2=al*b2;cl3=al*b3;c21=a2*bl;c22=a2*b2;c23=a2*b3; 
datalines; 

1 1 10 1 1 12 1 1 11 1 2 13 1 2 15 1 3 21 1 3 19 

2 1 16 2 1 18 2 2 13 2 2 19 2 2 14 2 3 11 2 3 13 
proc glm data=ex_18; 

class row col; 

model y al a2 bl b2 b3 ell cl2 cl3 c21 c22 c23=row col row*col / ssl 
ss2 ss3 el e2 e3; 
random row col row*col; 
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TABLE 18.3 

Type I—III Sums of Squares for Each Column of [Zl, Z2, Z3] and y for Data Structure in 
Figure 18.1 and Data in Table 19.2 

SAS Sums of Squares 


Effect 


Type I 



Type II 


Type III 

Row 

Column 

Interaction 

Row 

Column 

Interaction 

Row 

Column 

Interaction 

al 

3.50000 

0.0000 

0.000 

3.40000 

0.0000 

0.000 

3.37500 

0.00000 

0.000 

a2 

3.50000 

0.0000 

0.000 

3.40000 

0.0000 

0.000 

3.37500 

0.00000 

0.000 

bl 

0.07143 

3.1429 

0.000 

0.00000 

3.1429 

0.000 

0.00000 

3.10588 

0.000 

hi 

0.07143 

3.1429 

0.000 

0.00000 

3.1429 

0.000 

0.00000 

3.10588 

0.000 

b3 

0.00000 

2.8571 

0.000 

0.00000 

2.8571 

0.000 

0.00000 

2.82353 

0.000 

ell 

0.64286 

0.9378 

0.776 

0.42353 

0.9378 

0.776 

0.37500 

0.77647 

0.776 

cl 2 

0.28571 

0.6521 

0.776 

0.42353 

0.6521 

0.776 

0.37500 

0.77647 

0.776 

cl 3 

0.28571 

0.7227 

0.706 

0.29412 

0.7227 

0.706 

0.37500 

0.70588 

0.706 

c21 

0.28571 

0.6521 

0.776 

0.42353 

0.6521 

0.776 

0.37500 

0.77647 

0.776 

c22 

0.64286 

0.9378 

0.776 

0.42353 

0.9378 

0.776 

0.37500 

0.77647 

0.776 

c23 

0.28571 

0.7227 

0.706 

0.29412 

0.7227 

0.706 

0.37500 

0.70588 

0.706 

y 

0.64286 

14.7597 

109.145 

0.18824 

14.7597 

109.145 

0.16667 

8.90980 

109.145 


sums of squares for each column using SAS-GLM. The sums of squares for each of the 
columns are displayed in Table 18.3. The expectation of the type I sum of squares due to 
rows is computed as 

£[SSROW(I)]= (l)<7g + (0.6429 + 0.2857 + 0.2857 + 0.2857 + 0.6429 + 0.2857) <r 2 
+(0.0714 + 0.0714 + 0.000)cr2 + ( 3.500 + 3.500) (7 2 
= c\ + 2.4284o L 2 + 0.1428 o ,7 + 7o a 2 

The sums of the sums of squares for the respective effects are displayed in Table 18.4. 
Thus, Table 18.5 contains the expectations of the remaining sums of squares that were 
computed using the summary displayed in Table 18.4. The coefficients can be computed 


TABLE 18.4 

Sums of the Type I—III Sums of Squares for Each Column of [Zl, Z2, Z3] for Each of the Effects 
Using Data from Table 18.3 

SAS Sums of Squares 

Type I Type II Type III 


Effect 

Row 

Column 

Interaction 

Row 

Column 

Interaction 

Row 

Column 

Interaction 

A Columns 

7.00000 

0.0000 

0.000 

6.80000 

0.0000 

0.000 

6.75000 

0.00000 

0.000 

B Columns 

0.14286 

9.1429 

0.000 

0.00000 

9.1429 

0.000 

0.00000 

9.03529 

0.000 

C Columns 

2.42857 

4.6252 

4.518 

2.28235 

4.6252 

4.518 

2.25000 

4.51765 

4.518 
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TABLE 18.5 

Expected Values of the SAS Type I—III Sums of Squares, 
Computed Using Hartley's Method of Synthesis 


E[SSROWS(I)] = 
E[SSCOLUMNS(I)] = 
E[SSINTERACTION (I)] = 
E[SSERROR(I)] = 
E[SSROWS (II)] = 
E[SSCOLUMNS (II)] = 
E[SSINTERACTION (II)] = 
E[SSERROR (II)] = 
E[SSROWS (III)] = 
E[SSCOLUMNS (III)] = 
E[SSINTERACTION (III)] = 
E[SSERROR(III)] = 


ct e 2 + 7.0cr 2 + 0.1429<7 6 2 + 2.4286<7 2 
2af + 9.1429(7 2 + 4.6252c 2 
2ct 2 + 4.518ct 2 
8ct 2 

ct 2 + 6.8<r 2 + 2.2834<7 2 
2 of + 9.1429c 2 + 4.6252c 2 
2c 2 + 4.518c 2 

8 cf 

of + 6.75c; 2 + 2.25c 2 
2c 2 + 9.0353c,, 2 + 4.5177c 2 
2c 2 + 4.518c 2 
8c 2 


for each expected sum of squares where the coefficient of of is the number of degrees of 
freedom associated with the respective sum of squares. 

There are several alternative methods for computing sums of squares (with names 
attached) in the statistical literature that are discussed next. The discussion contains 
indications as to when the techniques can be applied to mixed effects models as well as 
random effects models. Henderson (1953) introduced four methods of computing sums of 
squares, called Henderson's methods I, II, III, and IV (also see Searle, 1987; Henderson, 
1984). The following discussion considers Henderson's methods I and III. The analysis of 
variance method, or Henderson's method I, is appropriate only for random effects models 
and is a technique that consists of computing sums of squares analogous to those com¬ 
puted for balanced data sets except that they are altered to account for unequal numbers 
of observations per treatment combination or unbalanced data sets. Henderson's method I 
sum of squares was used above without justification, but that justification follows. The 
two-way classification random effects model is used to demonstrate this method. 

A model that can be used to describe the balanced two-way data set is 

y ijk = p+a i + b j + c ij + £ ijk i = l,2,...,s, j = 1,2,... ,t, k = l,2,... ,n 

where 


«, - i.i.d. N( 0, of), bj - i.i.d. N( 0, of), 


c^ ~ i.i.d. N( 0, of), and e ijk ~i.i.d. N( 0, of) 


The sums of squares used in an analysis of variance are 


SSA = ntYjCh.. 
1=1 

SSB = nsX(y,. 
!= i 


y " J1 = SiT" «< 

), = £ V 

“7 ns nst 


y- 
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and 


ssab = - Vi- ~ v-i . + y-f 

;=i 7=1 

s I U- S J .2 t y- 2 

_ y y jivi _ y y^_ _y + T- 

mh « w ft ns nst 

To convert these sums of squares from equal sample sizes to unequal sample sizes, replace 
the products nst with n .., ns with n ■ and nt with n,,. Thus, the sums of squares for unequal 
sample sizes become 


SSA = X 



SSB = X 


y\ 


H n -j 



/ 

n„ 


and 


SSAB = XX 


y?. 


,=i ;=i n tj 


•yf_ 

' n s . 




These are the sums of squares used at the beginning of the discussion of the data structure 
in Figure 18.1. 

The fitting-constants method, or Henderson's method III, involves fitting various linear 
models to the data and then computing the corresponding sums of squares. The method uses 
what is called the reduction in the sums of squares due to fitting the full model and those due 
to fitting various submodels (see Chapter 10). To set the notation, consider the model 

y — y 61 + £ 

The reduction in the total sum of squares due to fitting the full model is 


R(b u b 2 ,b 3 ) = y'y - SSERRORfl),, b 2 , b 3 ) 


where SSERROR^, b 2 , b 3 ) is the residual sum of squares after fitting the full model. The 
reduction due to fitting b j and b, is R(B„6 2 ) = r/'y-SSERRORfl;,, bf where SSERRORfF,, b 2 ) 
is the residual sum of squares for the model y = X l b 1 + X 2 b 2 + e. 

The reduction due to b 3 after fitting 6 and b 2 is denoted by R(b 2 \b ]r b 2 ) and is given 
by R(b 3 1 b u b 2 ) =R(b u b 2 , b 3 ) -R(b u b 2 ) = SSERRORfl;,, b 2 ) - SSERRORfl;,, b,, b 3 ). Likewise, 
the reduction due to b 2 after fitting b 2 is Rlb^bf = R(b ]r b 2 ) - R(b 2 ) = SSERRORdy)- 
SSERROR(1 j 1 , b 2 ). Finally, the reduction due to 6, and b 2 after fitting b 3 is R(b ]r b 2 \b 3 ) = 
R(b l ,b 2r b 3 )-R(b 3 ) = SSERROR{b 3 )-SSERROR(b lr b 2r b 3 ). 

One of the advantages of using this technique is that E[R(b u b 2 \b 3 )\ does not depend on b 3 
if it is a fixed effect or a 3 if it is a random effect unless b 3 denotes an interaction between 
and b 2 or an interaction with b 1 or b 2 . 
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For the model 


y -jp + Xf> + X 2 t + X 3 g + e 

where b ~ N( 0, of), t ~ N( 0, of), g ~ N( 0, of), and e ~ N( 0, of), one set of possible sums of 
squares (which are type I sums of squares from SAS-GLM and SAS-MIXED) are 

R(b\p) = R(p,b)-R(p) 

R(t\p,b) = R{p,b,t) - R(p, b) 

R(g | P,b,t) = R(p,b,t,g) - R(p, b, t) 

SSERRORfq, b, t, g)] = y'y ~ R(p, b, t,g) 

The expectations of these sums of squares have the forms 

E[SSERROR(^, b, f,g)] = {n- p)o 2 £ 

E[R(g | p, b, t)] = kpjf + k 2 of 
E[R(t\p,b)] = k 3 o 2 e + k 4 of + k 5 o 2 , 


and 


E[R(b | p)] = k h o 2 £ + k 7 o\ + k g of + k 9 o 2 b 

The type I sums of squares given by R(f | p), R(b\p,t),R(g\p, b, t) and SSERRORfq, b, t, g) could 
also be used, and the method of fitting constants can be utilized for both random effects 
and mixed effects models as long as the fixed effects are fit before the random effects. 

The method of synthesis can be used to evaluate the expectations of sums of squares for any 
model involving random effects and/or multiple error terms. A set of sums of squares and 
their expectations (or mean squares) can be used to estimate the variance components, to 
develop tests of hypotheses, and construct confidence intervals about individual variance 
components and/or functions of the variance components. Methods of estimation are dis¬ 
cussed in Chapter 19, and inference techniques are presented in Chapter 20. Many statistical 
software programs automatically use Hartley's synthesis to compute expected mean squares. 


18.4 Concluding Remarks 

In this chapter, the concepts of random and fixed effects were defined as well as the concept 
of a random effects model. The random effects model is expressed in matrix form in order 
to describe methods for computing the expected mean squares. An unbalanced one-way 
treatment structure in a completely randomized design structure was used to demonstrate 
the algebraic method and Hartley's synthesis method of computing expected mean 
squares. Different methods for computing sums of squares were described, and an unbal¬ 
anced two-way treatment structure in a completely randomized design structure was 
used to demonstrate the computations of the respective expected mean squares. SAS code 
is presented to demonstrate Hartley's method of synthesis. 
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18.5 Exercises 

18.1 The model for a one-way random effects treatment structure in a randomized 
complete block design structure is y tj = ju + b i + «• + e ijr i -1 ,2,... ,n m and j = 1,2,..., t 
with assumptions b t ~ N( 0, Og lock ), «• - N(0, of*), and - N(0, erf). Use the algebraic 
method to valuate the expectations of the three sums of squares 

"M I 

SS(blocks) = f£(y„ - y..f, SS(treatments) = n blk ^(y- y..f 

<=1 H 


and 


n blk t 

SS(error) = X X (y# “ F, “ F •/ + F ..) 2 

i=i pi 

18.2 For the one-way random effects treatment structure in a randomized complete 
block design structure model in Exercise 18.1, use the n m = 5 and t = 4 and evalu¬ 
ate the expectations of the three sums of squares using Hartley's method of 
synthesis. 

18.3 The usual split-plot design (Section 5.2) can be expressed as 

yijk = l J -ik + b j + 'w ij + £i jl , i = l,2,, a, j = 1,2,... ,b, and k = l,2,...,t 

where - N( 0, a 2 blk ), w tj ~ N( 0, o^ hole _ plol ), and e ljk ~ N( 0, of). 

Let a = 3,b = A, and t = 2, then use Hartley's method of synthesis to evaluate the 
expected values of the three sums of squares 

b 

SS(blocks) = at^jfy .j'-y ...) 2 

H 

a b 

SS(whole-plot error) = t'^^(y lj .-y_ j -y i .. + y..f 

i =1 7=1 


and 


a b t 

SS(subplot error) = 

i=l j =1 Jt=l 

For extra credit, use the algebraic method to evaluate the expectations of these 
sums of squares for general a, b, and t. 

18.4 Use the assumptions for the model in Section 18.3 and evaluate the expecta¬ 
tions of the type I sums of squares given by R(t\p), R(b\p,t), and R(g\p,b,t) 
SSERRORf/./, b, t,g). 




Methods for Estimating Variance Components 


There are several ways to estimate variance components for the general random effects 
model. Some of the procedures yield the same estimators when the design is balanced 
(equal sample sizes per cell and no missing cells) and different estimators when the design 
is not balanced. The four techniques discussed in this chapter are the method of moments, 
maximum likelihood, restricted, or residual maximum likelihood (REML), and MIVQUE. 
The method of moments produces unbiased estimates, maximum likelihood and REML 
estimators are consistent and have the usual large-sample-size properties of maximum 
likelihood estimates, and the MIVQUE method produces estimates having minimum 
variance within the class of quadratic unbiased estimates. When the design is balanced 
and the solutions for the variance components are all positive, the method of moments, 
REML, and MIVQUE estimators are identical. When the design is unbalanced, method-of- 
moments estimates are easiest to compute, while the other three methods require iterative 
algorithms. On the other hand, the maximum likelihood, REML, and MIVQUE methods 
provide estimators with better properties than does the method of moments. REML is 
generally the preferred method of estimating the variance components. 


19.1 Method of Moments 

The method of moments has been used to obtain estimates of variance components since 
Eisenhart (1947) gave the name MODEL II to the random effects model. Many researchers 
worked on the method of moments over the next 20 years and derived estimators and 
developed methods for testing hypotheses, and methods to construct confidence intervals 
about variance components (see Searle, 1987; Graybill, 1976; Henderson 1984; Searle et al. 
1992; Burdick and Graybill, 1992 for good lists of references). In this section, a generalized 
version of the method of moments estimation process is discussed. 

The general random effects model can be written as 


V = /nM + Zi«i + Z 2 m 2 + • • • + Z r u r + e 
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where 


£(m,) = 0, i = 1,2,..., r 

Var(w,) = a]I ti , i = l,2,...,r (19.1) 

E(e) = 0, Var(e) = o 2 e I N 


and u y u 2 ,..., u r , and e are independent random vectors. 

The method of moments technique for estimating the variance components of the 
general random effects model of Equation 19.1 involves the following steps: 

1) Compute as many sums of squares and their corresponding mean squares as there 
are variance components in the model. 

2) Evaluate the expectation of each mean of square in terms of the variance compo¬ 
nents; these expectations must not involve p (or any other fixed effect parameters) 
and each variance component must be included in the expectation of at least one 
of the mean squares. 

3) Equate the expectation of the mean squares to the observed values of the mean 
squares, thus generating a system of linear equations in the variance components 
(replace the variance component parameters with variance component solutions 
in the set of equations). 

4) Solve the resulting system of equations to obtain an estimate of each of the variance 
components. 

One problem with the method of moments solution is that some of the estimates of the 
variance component can have negative values. When the solution for a variance compo¬ 
nent is negative, the estimator of the variance component is set to zero (keeping the 
estimator in the parameter space). 

When the random effects u,, i = 1,2,..., r, and e of model (19.1) are jointly independent 
and normally distributed and when the sums of squares are distributed independently of 
one another, the resulting estimators of the variance components are minimum variance 
unbiased. If w„ i = 1,2,..., r, and e have the same first four moments as those of a normal 
distribution, the estimators are minimum variance quadratic unbiased (Graybill, 1976, 
p. 632). The method of moments technique does not require an assumption of normality in 
order to obtain estimators. The only known property these estimators possess without the 
assumption of normality or the assumption that the distributions of the random vectors 
have the same first four moments of a normal distribution is that they are unbiased. 
However, the process of setting the estimator equal to zero when the solution is negative 
implies that the estimators are no longer unbiased. 

The key to the method of moments is determining how to compute sums of squares and 
then evaluating the expectations of the resulting mean squares. These topics were dis¬ 
cussed in Chapter 18. 

If the model has r +1 variance components, <7% a\, o\,, <7;, then r +1 sums of squares 
(or mean squares) are required. Let Q 0 = t/'A 0 y, Q, = y'Apj,..., Q r = y'A r y, denote the sums 
of squares with respective expectations given by 


E(Qi) = ho + hi v 2 ! + hi <* 2 2 + " • + K o 2 , 


z = 0,l,2,...,r 
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Equate each sum of squares to its expectation, inserting a tilde (~) over the variances to 
denote a solution as 

Q i = b i0 a 2 £ + b il d 2 1 + b i2 dl+ ■■■ + b ir of, z' = 0,l,2,...,r 
or, in matrix notation as. 


Qo 


^00 

^01 

boi 

bor 


Qi 


^10 

K 

b 12 

■ b lr 


q 2 

= 

^20 

b 2 i 

b 22 

•• b 2r 

of 

Qr\ 


J^rO 

bn 

b r2 ■ 

•• b rr _ 



or Q = Bo 2 . 

If the rank of the matrix B is r + 1, then all of the variance components are estimable. If 
the rank of B is less than r + 1, then not all of the variance components are estimable and 
only some linear combinations of the variance components are estimable. Assuming that 
the rank of B is r + 1, the solution to the system of equations is 

o 2 = £HQ = CQ (say) 

The solution is obtained without restricting the values to the parameter space; that is, 
some of the solutions may be negative. The solution for of is denoted by of and the estimate 
is denoted by of where 


[ of if of > 0 
[O if of < 0 


i = 0,1, 2,..., r 


In many models and methods for computing sums of squares, the B matrix is triangular 
and thus the solution can be obtained without inverting B. Each solution is a linear combi¬ 
nation of the observed sums of squares, Q 0 , Q„ Q 2 ,..., Q, as 

of = c mQo + c <iQi + c iiQi + ••• +c ir Q r i = 0,1,2,... ,r 

where c' = [c i0 c n c a ■ ■ ■ c ir ] is the zth row of C = B . The variance of of is 
Var(d^) = Var(c i0 Q 0 + c,- jQj + c i2 Q 2 + ••• + c ir Qf) 

When the Q t , i = 0,1,..., r, are uncorrelated, the variance of of is 

Var (of) = cf 0 Var(Q 0 ) + c^Var(Q 1 ) + <f 2 Var(Q 2 ) + ••• + cfVar(Q r ) 


A summary of method-of-moment estimators and their variances for several models is 
presented in Searle (1971, Chapter 11). For most balanced models (assuming the moments 
of the distributions of the random variables correspond to the first four moments of a nor¬ 
mal distribution), the method of moments estimators are uniformly minimum variance 
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unbiased estimators of the variance components (Graybill, 1976). Thus, for nearly balanced 
models, the method-of-moments estimators should have fairly good properties. 

19.1.1 Applications. Example 19.1: Unbalanced One-Way Model 

The unbalanced one-way random effects model of Example 18.2 is 

y rj = p + Uj + e,j i = 1,2, ...,t and; = 1,2,... ,n, 

where the u t are uncorrelated with mean 0 and variance of, the e,y are uncorrelated with 
mean 0 and variance of, and the u { and the e i; - are uncorrelated. Two sums of squares that 
can be used are the sum of squares within, Q 0 or SSW and the sums of squares between, 
Q, or SSB where 


Qo = Z£(y, y - y,f = = ssw 


i=i j=i 


i =1 j =1 i =1 


and 


Qi = £«,(y, - y ..) 2 = £«,y , 2 - fi>,V = SSB 

i=i i=i V «=i / 

The expectations of Q 0 and Q, were evaluated in Chapter 18 as 

E(Q 0 ) = (N-t)al 


and 


( 


E(Qi) = (t-l)al + 


N - 


V 



J 


where N = 'L‘ i=1 n i 

The equations obtained by equating the sums of squares to their expectations are 

Q 0 = (N-t)aj 


Qi=(t-i)a; + 


t 


N - 


V n 2 

4-1 i=l 1 


N 


Q 0 

Qi 


N-f 

(t-1) 


N- 


0 

nf 

,=i i 


N 



or, in matrix notation. 
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The set of equations can also be generated by equating the observed mean squares to 
their expectations as the mean squares are obtained by dividing each equation by its 
corresponding degrees of freedom. The solutions are the same. The resulting system of 
equations to be solved involving the mean squares is 


and 


Qi 

t -1 


N - 


Im! 

N 


= a: + ■ 


t- 1 


The solution to this system of equations is 


2 = Q 

£ Al¬ 


and 


or = 


Qi - (t ~ 


N - 


N 


The method moments estimators are 


and 


a 


2 

£ 



if of > 0 
if of < 0 


19.1.2 Example 19.2: Wheat Varieties in a One-Way Random Effects Model 

An experimenter randomly selected four varieties of wheat from a population of variet¬ 
ies of wheat and conducted an experiment to evaluate damage caused by insects on the 
wheat plants just prior to heading. The design structure was a completely randomized 
design with four replications or plots per variety (the plot is the experimental unit). 
Because of environmental conditions, some of the plots were destroyed (flooded out by 
excess rain). A day just before the wheat plants started to head, the experimenter 
randomly selected 20 plants from each plot and rated the amount of insect damage done 
to each plant using a scale from 0 to 10 where 0 indicates no damage and 10 indicates 
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severe damage. Thus, the response measured on each plot is the mean of the ratings 
from the 20 plants. The data are shown in Table 19.1. The computations necessary to 
compute the sums of squares, Q 0 and Q,, are given in Table 19.1 along with the resulting 
sums of squares, mean squares, the expected mean squares, the system of equations, the 
resulting solution, and the estimates of the variance components. The information 
obtained from the estimates of the variance components is that the plot-to-plot variance 
within a variety is about 0.056, while the variance of the population of varieties is about 
0.067. The variance of a randomly selected plot planted to a randomly selected variety is 
the sum of the two variance components or 

damage = K + ^Var = 0-056 + 0.067 = 0.123 


TABLE 19.1 

Data and Computations for Insect Damage on 
Wheat Varieties of Example 19.2 


Variety 

A 

B 

C 

D 

3.90 

3.60 

4.15 

3.35 

4.05 

4.20 

4.60 

3.80 

4.25 

4.05 

4.15 



3.85 

4.40 



= 5Z35 = 

13 

= 212.1275 

i-i j .i 
4 

£>1^ = 211.61958 


Q 0 = 0.50792 
Q 1 = 0.81016 


Expected Mean Squares 

'Q» 


El — I = of + 3.1795 <rf 


System of Equations 

= 0.05644 = a) 

9 

= 0.27005 = ct; + 3.1795 6^ 

Solution to System of Equations 

<5^ = 0.05644 
5^ = 0.06719 

Estimates of the Variance Components 

6 2 e = 0.05644 
o 2 Vat = 0.06719 
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The estimate of the intraclass correlation is 


P = 


2.2 
^Var 
2.2 . 2.2 
+ °Var 


0.067 

0.123 


0.545 


When experiments are done with treatment structures that are other than one-way or 
other than a completely randomized design structure, there is no universally accepted 
technique for obtaining sums of squares from which to derive estimates of the variance 
components. The methods presented in Chapter 18 for computing sums of squares are 
used to estimate the variance components for a two-way random effects model. 

19.1.3 Example 19.3: Data for Two-Way Design in Table 18.2 

The data in Table 19.2 are observations for Figure 18.1, where the expectations of several 
types of sums of squares were evaluated via synthesis. The values Q 0 , Q v Q 2 , and Q 3 
correspond to Henderson's method I sums of squares are 


Q 0 = SSError = 30.6666 

2 2 2 

Q, = SSA = S' — - — = 0-6428 
ti n i. n .. 

3 l / 2 T / 2 

Q, = SSB = Y -^2 = 15.2143 
Tt n ■ n 


q 3 


3 yl 


SSAB = Y j Y,— 


i =1 j=l n ij 


2 i , 2 3 t r i / 2 

yy ± 2 - y i ±. + = 108.6905 

tt n u p n t n„ 


The expectations of these sums of squares are: 


E(Q 0 ) = 8a; 

E(Qj) = a] + 0.1429cr^ + 2.4286a; + 7.0a„ 2 
E(Q 2 ) = 2a; + 9.286a; + 4.77a 2 + 0.20a 2 
E(Q 3 ) = 2a 2 + 6.37 a 2 


TABLE 19.2 


Data for a Two-Way Random Effects Treatment 
Structure for Example 19.3 


Row Treatments 

Column Treatments 

1 

2 

3 

1 

10 

13 

21 


12 

15 

19 


11 



2 

16 

13 

11 


18 

19 

13 



14 
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Equate the values of the sums of squares to their respective expected values to provide 
following system of equations 

30.6666 = 8(7* 

0.6428 = a] + 0.1429(7* + 2.4286(7* + 7.0(7* 

15.2143 = 2d* + 9.286(7* + 4.77d* + 0.20(7; 

108.6905 = 2d* + 6.37a* 


The solution to the system of equations 

<7* = 3.83325 
<7* = 15.8593 
(7* = -10.8817 
(7* = -8.2523 

provides unbiased estimates of the variance components. Because some of the above 
values are negative, the final method of moments estimates of the variance components 
are taken to be 


a * = 3.83325 
(7* = 15.8593 
a * = 0.00 
a] = 0.00 

The results from Example 19.3 point out one of the problems often encountered when 
using the method of moments technique: It can yield negative solutions for the variance 
components, which are not admissible as estimators. Listed below are the method of 
moments solutions and the resulting estimates for Example 19.3 obtained by solving the 
systems of equations generated by the method of fitting constants or Henderson's method 
III sums of squares (SAS® type I) and the SAS type III sums of squares (the expectations of 
these sums of squares were evaluated by synthesis and are given in Table 18.4). 

The type I solutions are 


a; = 3.8333 


_ 2 109.1451 - 2 3.8333 A , nn 

a; =---- = 22.4620 

c 4.5178 

14.5797 - 2(3.8333) - 4.6252(22.4620) 

9.1429 

2 _ 0.6429 - 3.833 - 0.1428(-10.71322) - 2.4284(22.4620) 


a: = 


= -10.5877 


= -8.0329 


7.00 
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and the type I estimates are 


a1 = 3.8333 
a) = 22.4620 
a; =0 
=0 


The type III solutions are 


3.8333 


109.1451 - 2 3.8333 „„ „„„ 

---- = 22.4620 

4.5178 

8.9098-2(3.8333) - 4.5178(22.4620) 
9.0353 

0.677 - 3.833 - 2.2500(22.4620) _ 
6.75 


= -11.1080 
8.0305 


and the type III solutions are 

3.8333 
22.4620 
0 
0 



When the solution for a variance component is negative, the standard process is to set 
the corresponding estimate to zero. To demonstrate the consequences of this process, the 
expected mean squares for the mean square within and mean square between for a one¬ 
way random effects model can be expressed as 


E(MSWithin) = a; 
E(MSBetiveen) = a; + cal 


The solution for al is 


2 MSBetiveen - MSWithin 


Under the normality assumption, MSWithin and MSBetiveen are independent random 
variables. If al = 0, then both expected mean squares are equal to of thus, the probability 
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the solution for af is negative is approximately 0.50, depending on the degrees of freedom. 
If the numerator and denominator degrees of freedom are equal then 

PfMSBetiveen < MSWithin \ af = 0) = 0.50 

As af gets larger, the probability of getting a negative solution decreases. Thus, it is 
reasonable to set af equal to zero when the solution is negative (see Searle et al. 1992 for a 
more detailed discussion). Confidence intervals and tests of hypotheses constructed from 
methods of moments estimators are discussed in Chapter 20. 


19.2 Maximum Likelihood Estimators 

In statistics, the most common technique for estimating parameters of a distribution is the 
method of maximum likelihood. The process uses the assumed distribution of the obser¬ 
vations and constructs a likelihood function which is a function of the data and the 
unknown model parameters. The maximum likelihood estimators are those values of the 
parameters from the parameter space that maximize the value of the likelihood function. 
In practice, the log e of the likelihood function is maximized. Equivalently, one can find the 
values of the parameters in the parameter space that minimize -2(log e ) of the likelihood 
function. The parameter space for the general random effects model (19.1) is 

{-oo<p<+oo r 0<af<+°°, i = 1,2,..., k; 0<oy<°°} 

For the general random effects model 19.1, the distribution of the vector of observations 
is 


y ~ N(j n p, cr|J„ + a\ZfZ\ + aiZ 2 Z( + • • • + afZ k Z[) 


or y~ N(j„p, E) 


The likelihood function of the observations is 

L(p, af, af, a 2 2 ,...,ol\y) = (27r)- n/2 \Zf 1/2 exp \-\(y - j n p)'E'fy - j n p)\ 
and -21og e of the likelihood function is 

t{p, af af a 2 2 ,..., a 2 k \y) = -2log e [L( p, a 2 , af, a 2 l ...,af\y)] 

= n log e (27T) + log e (|2;|) + (y-jnP)' E~\y -j n p) 

The process of minimizing l(p, al af, af,..., af\y) over the parameter space generally 
requires an iterative procedure utilizing likelihood equations generated by taking either 
the first derivatives or the first and second derivatives of l( p, af, af, af,..., af \ y) with 
respect to each of the parameters of the model. When the data are from a balanced design 
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(equal n and no missing cells), the set of likelihood equations generated by equating 
the first derivatives of the likelihood function to zero can often be solved explicitly. The 
solutions obtained are not restricted to the parameter space since some of the values of the 
solutions for the variance components can have negative values. For some balanced 
models, it can be shown that the maximum likelihood estimate of of is 07 = 0 when the 
solution for of from the likelihood equations is negative (Searle, 1971). For unbalanced 
designs, an iterative technique is required where the estimation process should restrict the 
estimates of the variance components to belong to the parameter space. 


19.2.1 Example 19.4: Maximum Likelihood Solution for Balanced One-Way Model 

The balanced one-way random effects model can be expressed as 

x/ij = p + Uj + Ejj, i = 1,2,..., t and j = 1,2,... ,n 

where u, ~ i.i.d. N( 0, cl), e tj ~ i.i.d. N( 0, crj), and the u, and the £, ; are independent, or y ~ 
M(/„ ® ),)P, A where E= c 2 u J n ®I t + a 2 e I n ® I n . 

The notation A ® B denotes the Kronecker or direct product of the two matrices, A and 
B (Graybill, 1976). The covariance matrix E can be expressed as 


E= a: 


I- h 

n 


+ (c 2 e + nci) 


where a 2 and (oj. + no 1 -) are the characteristic roots of E and [(/„ - ( 1 //))/„) 0 /,] and 
[(l/«)/„) ® I,] are orthogonal idempotent matrices. 

The inverse of the covariance matrix is 


, 1 

[f ll T 1 

1 

|Y 1 T 1 T 1 

E - 1 - 

In - Jn ® h 

" 2 . 2 

Jn ® I, 


Lv n J J 

+ nc u 

Lv” J J 


Using this representation of the inverse of the covariance matrix, the following can be 
obtained 


\E\ = (alf n - r fal + na]) 1 


or 


log e \E\ = t(n - 1 ) logger;) + t log e (a 2 e +nal) 


and 


nf(y„ - pf SSE SSU 

2 2 2 2 2 

C 7 e + nc u o e a e + no u 


(y - LvY z\y - j„ t p) 



320 


Analysis of Messy Data Volume 1: Designed Experiments 


where 


SSE = IX(yr^) 2 and SSU = n± { y L -yJ 

1=1 7=1 f=l 

Using these expressions, -2 log,,(L(/j,( 7 2 , o 2 e \y) can be written as 

i(p, of a] | y) = fnlog e (27r) + t(n - 1) log e (a 2 ) + t log e (<7 2 + na 2 u ) 
nt(y .. - pf SSE SSU 

a £ + na u a 2 cr; + ncr,, 


The likelihood equations are obtained by differentiating l(p, of ol |y) with respect to the 
three parameters, p, of and cr , 2 and then setting the derivatives equal zero. The derivatives 
evaluated at the solution for the parameters when set equal to zero are 


d£(p r of o; | y) 
dp 

dl(p, of o;\y) 
<^1 

dt.(p, ofofy) 


~2nf(y„ ~ P) 
ol + no 2 


t(n - 1) t nt(y__ - pf SSE 

o 2 s + &l + no 2 u (<r 2 + noff (ol) 2 
nt n 2 t{y.. ~ flf nSSU 
d; + no 2 (ol + noff (<r 2 + rnr 2 ) 2 


SSU 


(ol + naff 


= 0 


The solution to the maximum likelihood equations is 

SSU ucr 
- MS Error 

t 


- ~2 SSE hcC , ~2 1 

p = y , <7 —-= MSError, and cr,, = — 

J " £ t(n -1 " n 


Thus, the maximum likelihood estimates are 


= ]t..' o; = ol 


SSE 
t(n - 1 ) 


= MSError 


and 


2 

u 


0 


if ol > 0 

if <7 2 < 0 
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When the estimate of of is zero, then the estimate of of is recomputed by pooling SSE and 
SSU as well as their degrees of freedom to obtain 

-2 _ SSE + SSU 
£ tn -1 

If there happens to be a negative intraclass correlation, then the estimate of of obtained 
by pooling will be an underestimate of the variance. A careful investigation of the 
assumptions and their appropriateness must be evaluated because if there is a degree of 
competition among experimental units within a level of u„ then a negative correlation 
would be appropriate. In this case, the covariance matrix could be expressed as 

I=G 2 A p[J n ®U + G 2 A (t-p)[I„®U 
where of = of (1 - p) and of = o\p. 

Several computational algorithms have been developed for maximizing the likelihood 
function, thus, providing maximum likelihood estimates of the model's parameters 
(values in the parameter space) (Hemmerle and Hartley, 1973; Corbeil and Searle, 1976). 
The large sample size variances of the maximum likelihood estimates can be obtained by 
inverting the matrix of second derivatives where the second derivatives are evaluated at 
the values of the maximum likelihood estimates. Maximum likelihood estimators and 
their variances have been obtained for several designed experiments and are reported in 
Searle (1971) and Searle et al. (1992). 

When using computer software to fit these models, the experimenter should thoroughly 
investigate the algorithm being used and determine its properties, that is, whether it 
always yields meaningful estimates and whether it maximizes the likelihood function 
over the parameter space. 

The maximum likelihood estimates of the variance components in Examples 19.2 and 
19.3 were obtained by using SAS-Mixed. The maximum likelihood estimates for of and of 
in Example 19.2 are of — 0.05749 and of = 0.04855, as displayed in Table 19.3. The maximum 
likelihood estimates for the parameters of the model in Example 19.3 are of = 3.8428, 
of = 7.40708, of = 0, and of = 0, as displayed in Table 19.4. The ML algorithm in SAS-Mixed 
does the maximization over the parameter space, as is shown by the values of of and of 
being set equal to zero. 


TABLE 19.3 

Proc Mixed Code to Compute Maximum Likelihood Estimates of the Variance Components 
and Mean for the Data in Example 19.2 

proc mixed method=ml data=exl9_2 covest cl; 

class variety; 

model damage=/solution; 

random variety; 


Covariance Parameter 

Estimate 

Standard Error 

Lower CL 

Upper CL 

Variety 

0.04855 

0.05075 

0.01267 

2.5700 

Residual ctj 

0.05749 

0.02760 

0.02690 

0.1972 

Estimate of Mean 

Standard Error 

df 

f-Value 

Pr > | f | 

= 3.9909 

0.1297 

3 

30.78 

<0.0001 
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TABLE 19.4 

Proc Mixed Code to Obtain Maximum Likelihood Estimates of the Variance Components 
for the Two-Way Random Effects Model in Example 19.3 


Proc Mixed data=exl9 3 covtest cl 
title2 "Using Maximum Likelihood", 
class row col; 
model y=/solution; 
random row col row*col; 

method=ML; 



Covariance Parameter 

Estimate 

Standard Error 

Lower 

Upper 

Row 

0 

— 

— 

— 

Col 

0 

— 

— 

— 

Row x col 

8.5604 

5.9559 

3.1104 

67.2317 

Residual 

3.5989 

1.8071 

1.6376 

13.3052 

Estimate of Mean 

Standard Error 

df 

f-Value 

Pr > |f| 

14.6910 

1.3008 

1 

11.29 

0.0562 


19.3 Restricted or Residual Maximum Likelihood Estimation 

Restricted or residual maximum likelihood estimates are obtained by maximizing that part 
of the likelihood function that does not include any fixed effects or by maximizing the like¬ 
lihood function of the residuals after the fixed effects have been removed from the model. 
For the models in this chapter there is just one fixed effect parameter, p. This is also equiva¬ 
lent to looking at the conditional distribution of a set of sums of squares given the overall 
sample mean. The process is accomplished by factoring the likelihood function into parts 
where one part involves fixed effect parameters and a second part just involves the variance 
components. The REML equations are obtained by differentiating -21og e of the residual 
likelihood with respect to the variance components and setting them equal to zero. 

For the general random effects model (19.1), the likelihood function of the observations is 


L{p, o\,o\,o\,... ,o\\y) 


^ |^| — (i/2)(y j n fi) x (y /mao] 


and the -2 log e of the likelihood function is 

l{p, of, of, of ,..., ol\y) = -2 logj L(p, of of of, ...,o 2 k \ t/)] 

= n log e (27r) + log e (|X|) + (y-j n li)' Z\y-j n p) 

= l{ji, of,of,of,...,o\\y) 

+ l(of, of, of,..., of | SSE, SSU lr SSU 2 ,...,SSU k ) 

where SSE, SSU lr SSU 2 ,..., SSU k denote independent set of sums of squares that do not 
depend on p and £(of, of, of,..., of \ SSE, SSU lr SSU 2 ,...,SSU k ) is the residual likelihood 
function. The solution to the residual likelihood equations provides the REML estimates of 
the variance components. 
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19.3.1 Example 19.5: REML Solution for Balanced One-Way Model 

Using the balanced one-way random effects model described in Example 19.4, the -2 log. 
of the likelihood function can be expressed as functions of the sufficient statistics ySSE 
and SSU as 


l{p, of al y) = tn \o%f2n) + t(n - 1) logger;) + t log e (o 2 e +no 2 u ) 


nt(y..~p) 2 SSE 


SSU 


c 2 e + no] + <u 2 + + no 2 u 

+ !og=( 2 ^) + !°ge(<u 2 + 

<7.. + no,, 


(tn- 1) log c (27r)+ t(n- 1) log. (o 2 E )+(t -1) logger; + nol )+ 


= £(p, of o ; | yj + £(of o; \ SSE, SSU) 


SSE SSU 


2 2 2 

o: o; + not 


The residual likelihood function is (do 2 , o]. \ SSE, SSU) where 


£(ol, o;\SSE,SSU) 


(tn - l)log e (27r) + t(n - 1)logger;) 


+ (t- l)log c ((7; + no]) + 


SSE 



SSU 

2 2 

ct e + no;, 


The restricted maximum likelihood equations are obtained by differentiating 
l(o\, o\ j SSE, SSU) with respect to the two parameters ol and o 2 and then setting the 
derivatives equal zero. The derivatives, evaluated at the solution for the parameters, set 
equal to zero are 

d£(of o; |SSE, SSU) _t(n-l) t-1 SSE SSU 

do 2 e &l + &l + nol (off (6] + no]) 2 

d£(ol, o:\SSE, SSU) _ n(t — 1) nSSU 

dol o; + n&l (d; + no:) 2 


The solution to the residual maximum likelihood equations is 

~ MSError 

t - 1 


o 2 = 


SSE A ~2 1 

-= MSError, and cr,, = — 

t(n - 1) “ n 


-[MSU - MSError ] 
n 


The residual maximum likelihood estimates are 

-2 ~ 2 SSE 

o=o= -= MSError 

t(n - 1) 
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and 


-2 = [<7« if <7^0 

" jo if a; < 0 

When the estimate of c 2 is zero, then the estimate of cr 2 is recomputed by pooling SSE and 
SSU as well as their degrees of freedom to obtain 

-2_ SSE + SSU 
£ tn-1 

Tables 19.5 and 19.6 contain the SAS-Mixed Code and results for extracting REML esti¬ 
mates of the variance components for the data sets in Examples 19.2 and 19.3, respectively. 


TABLE 19.5 


Proc Mixed Code to Compute Restricted Maximum Likelihood Estimates of the 
Variance Components and Mean for the Data in Example 19.2 


proc mixed method=reml data=exl9 2 
class variety; 
model damage=/solution; 
random variety; 

covtest cl; 



Covariance Parameter 

Estimate 

Standard Error 

Lower CL 

Upper CL 

Variety 

0.07316 

0.07802 

0.01875 

4.4672 

Residual 

0.05700 

0.02713 

0.02681 

0.1929 

Estimate of Mean 

Standard Error 

df 

f-Value 

Pr > 1 1 1 

fi = 3.9863 

0.1515 

3 

26.31 

0.0001 


TABLE 19.6 

Proc Mixed Code to Obtain Restricted Maximum Likelihood Estimates of the Variance 
Components for the Two-Way Random Effects Model in Example 19.3 


Proc Mixed data=exl9 3 
title2 "Using Maximum 
class row col; 
model y=/solution; 
random row col row*col 

covtest cl 
Likelihood", 

method=REML; 



Covariance Parameter 

Estimate 

Standard Error 

Lower 

Upper 

Row 

0 

— 

— 

— 

Col 

0 

— 

— 

— 

Row x col 

9.2425 

6.9923 

3.1503 

95.7334 

Residual 

3.8398 

1.9228 

1.7502 

14.1289 

Estimate of Mean Standard Error 

df 

t- Value 

Pr > 1 1 1 

fi = 14.8547 

1.3503 

1 

11.00 

0.0577 
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19.4 MIVQUE Method 

Rao (1971) described a general procedure for obtaining minimum variance quadratic 
unbiased estimators (MIVQUE) of the variance components. For the general random 
effects model of Section 18.2, the MIVQUE of a linear combination of variances 

0 = C 0 ol + C,of + C 2 a 2 2 + ••• + C k a 2 k 

is a quadratic function of the observations which is unbiased for 0 and has minimum 
variance within the class of quadratic unbiased estimators of 0. Thus, MIVQUE estimators 
of variance components possess the minimum variance property, whereas the method of 
moments estimators generally do not. Each individual variance component can be selected 
as a possible parameter to be estimated. Selecting C 0 = 1, C 1 = C 2 - ■ ■ ■ = C k = 0 provides 
0 = of Other choices of C, values will provide 0= a] as well as other linear combinations 
of the variance components. 


19.4.1 Description of the Method 

The estimate of 0 must be a quadratic function of y, thus, for some matrix A, the estimator 
of dhas the form y'Ay. The expectation of yAy is 


E(yAy) = tr(ZA) + p 2 j'„Aj n 

By assumption, E(yAy) = 0. Since the expectation does not depend on p, A must be chosen 
to satisfy p 2 j' n Aj n = 0. Under the conditions of normality, the variance of yAy when 

tPj’Ah = 0 is 


Var(t/'Ai/) = 2 tr [ZA] 2 


Thus the MIVQUE of 0 is yAy where A is chosen such that tr|X4] = 0 and tr|X4] 2 is mini¬ 
mized over the parameter space 

{0<(7^<oo, 0<<7j<°°, 0 < o\ < °°, ...,0 < a 2 k < o°} 

Rao (1971) shows that the MIVQUE of a 2 = [of of of ..., off is o 2 = S A f where S is a 
(k + 1) x (k + 1) matrix with elements 

s„. = tr[X,X,'RX,X' ] i,i' = 0,1,2. k 


and/is a (k + 1) x 1 vector with elements 


/ = y'RXjX'Ry, i = 0,1,2. k 
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and 


r=z~ i [i„-M u- 1 jy j:\z~ 1 

The solution for a 2 depends on the elements of Z that are functions of the unknown vari¬ 
ance components. In order to compute the MIVQUE of a 2 , some constants must be sub¬ 
stituted into Z for of, of, of,...,of. For that set of constants, the estimate of a 2 is MIVQUE 
(and is a quadratic function of y). In order for a 2 to be MIVQUE of a 2 , the elements of R 
must not depend on the data vector. Some software uses 1 as the value of the residual vari¬ 
ance and 0 as the values of each of the other variances and covariances. 

Usually it is best to substitute values for of, of, erf, ... ,of into Z that are close to the true 
values of the parameters. One possible procedure is to obtain values from other experi¬ 
ments. The process of using the fixed values (not dependent on the current data) as starting 
values for a 2 in a non iterative process or using zero iterations provides a solution that is 
called the MIVQUEO solution where 0 denotes that no iterations have been performed. 
Swallow and Monahan (1984) used the method of moments estimates as initial values for 
the variances and used MIVQUE-A to describe the resulting estimators of the variance 
components. Another method is to use an iterative procedure (Brown, 1976) by using some 
initial values of the variance components, say <rf 0 , of 0 , of,..., of, to start the process. Use 
those initial values to evaluate Z and obtain df 0j . Here dj 0) depends on the values chosen 
for of Then use of to evaluate Z to obtain the second iteration estimate, d 2 V) . Continue 
the iteration process until there is very little change from one iteration to the next. The 
resulting iterative MIVQUE values are no longer quadratic functions of y, since the ele¬ 
ments of Z are functions of y. The final estimator of of, say at step m + 1, can be called 
MIVQUE given the previous values of df m) . For balanced models. Swallow and Searle (1978) 
have shown that the equations simplify so that an explicit solution can be obtained. The 
solutions are identical to those provided by the method of moments. When there are unequal 
sample sizes and/or empty cells, the iterative procedure could be an appropriate method. 
A simulation study by Swallow and Searle (1978) indicates that REML, ML and method of 
moments provide better estimates of the variance components than MIVQUEO. 

The values of the MIVQUEs (either evaluated at constants for of, or at a given previous 
step) are linear combinations of quadratic forms of y. Thus, the variance can be evaluated 
since the variance of a quadratic form y'By is 2tr (BZ) 2 . Swallow and Searle (1978) show 
how to use the expressions to obtain variances for the estimators in an unbalanced one¬ 
way model. They computed and compared the variances of the MIVQUE and method of 
moments estimators for the unbalanced one-way model with varying numbers of popula¬ 
tions, sample sizes, and values of the variances. The variances of the MIVQUE estimators 
were evaluated as if the true values of of and of were used in the estimation process. The 
estimate of of obtained from the method of moments was quite comparable to that from 
the MIVQUE method where the variance of the MIVQUE was no more than 4% smaller 
than the variance of the method-of-moments estimator. 

For fairly balanced models (n ; not too different), the variance of the MIVQUE of of was 
no more than 10% smaller than the variance of the method of moments estimator of of. For 
many unbalanced sample sizes, the variance of the MIVQUE of of was as much as 60% 
smaller than the corresponding method-of-moments estimator. The variances of the esti¬ 
mators from the two methods would be much more similar if values other than the true of 
and of are used in the estimation process. SAS-Mixed has a MIVQUE option, denoted by 
MIVQUEO, which is a noniterative method. 

This section concludes with an example of an unbalanced one-way design using MIVQUE 
estimators. 
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19.4.2 Application. Example 19.6: MIVQUE for the Unbalanced One-Way Design 

The equations from Swallow and Searle (1978) are presented for a general one-way model 
and then they are then applied to the data in Example 19.2. The model is 

y t] = p + u, + £jj i = 1,2,... ,f and j = 1,2,... ,n, 

where u ~ N(0, O' 2 /), e~N( 0, ajI N ), and where u and eare independent random variables 
with N = Xi=i «,• 

Define 


h = 


°e0 + n j(7, l0 


and K = 




The elements of the matrix S are 


t 


( t ^ 


s.^^kf -IK^k 2 + K 2 \J j k 

1=1 

Ck 2 


i =i V i =i J 

t 7,3 ft V f 7,2 ^ 


*12 


7.3 ft \ t 7. 

= X— -IK^+K 2 X 


7=1 


72; 


77 n. 


i=i 


and 


v 7=i 7Vi=i n <y 


» 7,3 7 f ^2 A 2 


S 22 ~ _4 _ 


Eh 


The elements of the vector/are 


f 7 




1=1 V 


t \ 

t^y,. 

i'=l / 


and 


/ 2 = 


2=1 j=l 2=1 


XX y? - X wl f fc, 2 (y,. - *X *<y<. 

7=i y 


For given values of c 2 0 and cry,, the MIVQUE estimators of <r 2 and a\ are d 2 = S -1 /, or 

s n / 2 — S l 2 /l 


d? = 


and 


s 22 /i 


s i 2 / 2 


c 
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TABLE 19.7 


Proc Mixed Code to Compute MIVQUEO Estimates of the Variance Components 
and Mean for the Data in Example 19.2 


proc mixed method=mivque0 data=exl9 
class variety; 
model damage=/solution; 
random variety; 

2 covtest cl; 



Covariance Parameter 

Estimate 

Standard Error 

Lower CL 

Upper CL 

Variety cT ar 

0.05638 

0.05739 

0.01505 

2.4965 

Residual a\ 

0.06503 

0.03109 

0.03050 

0.2216 

Estimate of Mean 

Standard Error 

df 

f-Value 

Pr > | f | 

fi = 3.9906 

0.1392 

3 

28.67 

>0.0001 


TABLE 19.8 

Proc Mixed Code to Obtain MIVQUEO Estimates of the Variance Components 
for the Two-Way Random Effects Model in Example 19.3 

Proc Mixed data=ex!9_3 covtest cl method=MIVQUEO; 


title2 "Using Maximum Likelihood"; 
class row col; 
model y=/solution; 
random row col row*col; 

Covariance Parameter 

Estimate 

Standard Error 

Lower 

Upper 

Row 

0 

— 

— 

— 

Col 

0 

— 

— 

— 

Row x col 

25.4682 

9.9894 

13.3850 

66.1016 

Residual 

4.0023 

1.5698 

2.1035 

10.3879 

Estimate of Mean 

Standard Error 

df 

f-Value 

Pr > 1 1 1 

14.7094 

2.1309 

1 

6.90 

0.0916 


where c = s n s 22 - sf 2 . The MIVQUE estimators of the two variance components for the data 
in Table 19.1 are in Table 19.7 and of the four variance components for the data in Table 19.2 
are in Table 19.8. These estimators were obtained using the non iterative solution from 
SAS-Mixed. The variances of the estimators are 

Var(a^) = Var(o^) = and Cov(c7;:, of) = ^ s ' 2 

c c c 

The estimators are not very sensitive to the choice of a 2 u0 and of, but the variances do 
depend on the choice of the initial values of of i0 and of. Table 19.9 contains the MIVQUEO 
estimators and their variances for the data in Table 19.1 using the starting values listed. 
The variances of the estimators vary greatly and are extremely large when the starting 
values for al 0 and (jj 0 are far from the estimators of oy and d'f If an iterative procedure is 
used, the solution converges to 6 2 e = 0.057003 and a 2 u = 0.073155 with Var(d|) = 0.000721, 
Var(6Q = 0.005694 and cov(d|, dQ = -0.000235. The iterative procedure was started at 
several values (of i0 = 2 and a 2 e0 = 1, and cr 2 l0 = 50 and cr 2 e0 = 1000 among others). All choices 
for starting values converged to the above values in four iterations. After two iterations, 
the estimators were quite stable but the variances were still changing. 
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TABLE 19.9 

MIVQUEO Estimates of Variance Components for Example 19.2 for Selected Starting Values 


<ho 

<7«0 



Var(d-j) 

Var(frJ) 

Cov(dl, &l) 

1.00000 

0.00000 

0.05648 

0.05893 

0.18988 

0.07966 

-0.05677 

1.00000 

2.00000 

0.05655 

0.07495 

0.22150 

3.63984 

-0.07255 

10.00000 

20.00000 

0.05655 

0.07495 

22.14991 

363.98420 

-7.25498 

1.00000 

5.00000 

0.05646 

0.07718 

0.22208 

18.97352 

-0.07346 

0.10000 

0.50000 

0.05646 

0.07718 

0.00222 

0.18974 

-0.00073 

0.05644 

0.06719 

0.05670 

0.07294 

0.00070 

0.00496 

-0.00023 

0.05700 

0.07316 

0.05667 

0.07326 

0.00072 

0.00569 

-0.00023 

0.06503 

0.05638 

0.05683 

0.07145 

0.00093 

0.00410 

-0.00030 


19.5 Estimating Variance Components Using JMP 

The estimates of the variance components can be obtained using the fit model option of the 
JMP software (SAS Institute, Inc., 2005). Figure 19.1 gives the data set for Example 19.2 
displayed in a JMP data table (which was imported from a SAS data set). On the Analyze 
menu, select fit model as shown in Figure 19.2. On the fit model screen, select damage to be 


A example_19_2 

_ |nj|x 

▼ example_19_2 


variety 

Damage 

A 


i 

A 

3.9 


2 

B 

3.6 


3 

C 

4.15 


▼ Columns (2/0) 

4 

D 

3.35 


|L variety 

A Damage 

5 

A 

4.05 


6 

B 

4.2 


7 

C 

4.6 


8 

D 

3.8 


9 

A 

4.25 


10 

B 

4.05 


▼ Rows 

11 

C 

4.15 


All rows 13 

Selected 0 

Excluded 0 

Hidden 0 

Labelled 0 

12 

B 

3.85 


13 

C 

4.4 













-I 

u 



_1 

J 


FIGURE 19.1 Data set for Example 19.2 in JMP table. 
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w Model Specification 



Select Columns 

||. variety 
^Damage 


I Freq 
I By 


: Variables 

Personality: ;standard Least Squares vi 

^Damage 

options/ 

Emphasis: Effect Leverage v| 

Method: REML (Recommended) v 

| options1 Numeric 

171 Unbounded Variance Components 

| options/ Numeric 

1 1 Estimate Only Variance Components 

| options/ 


| Help | [ Run Model J 


Remove 


Construct Model Effects 


Add 


variety^ Random 


Cross 


Nest 


Macros 


Degree [j] 
Attributes t 
T ransform t 
I I No Intercept 


FIGURE 19.2 JMP fit model table for Example 19.2. 


the Y variable and select variety to be a model effect. Use the attributes menu to specify 
that variety is a random effect. The default estimation method is REML, but EMS can be 
selected to provide method of moments estimators using type 3 sums of squares. Click the 
run model button to obtain the results in Figure 19.3. The estimates of the variance compo¬ 
nents and their estimated standard errors are similar to those from SAS in Table 19.5. 
The main difference is that the confidence interval about the variety variance component 
is computed using the Wald method instead of the Satterthwaite approximation (see 
Chapter 20 for details). 

Figure 19.4 is the JMP data table for the data of Example 19.3. The fit model screen is 
shown in Figure 19.5 where row, col and row x col have been selected as random effects 
and the REML method is selected for estimation. The REML estimates of the variance 
components are shown in Figure 19.6, where the results are similar to those from SAS in 
Table 19.6. The fit model screen has another option that can be used where one can check 
the unbounded variance components box. This option does not restrict the solution to be 
in the parameter space (similar to using the unbounded option in SAS-Mixed). Figure 19.7 
has the unbounded variance components box checked and the solution is shown in Figure 
19.8. The solutions for the row and col variance components are negative and the Wald 
method is used to compute the confidence intervals (except for Residual). The JMP fit 
model process provides an appropriate analysis for models with random effects as does 
SAS-Mixed. 







FIGURE 19.4 JMP data table for Example 19.3. 
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FIGURE 19.5 Fit model screen for JMP for REML estimates for Example 19.3. 


0 example_19_3-Fit Least Squares _ □ X 


▼ v Response y 
Summary of Fit 


RSquare 

0.782879 

RSquare Adj 

0.782879 

Root Mean Square Error 

1.959525 

Mean of Response 

14.64286 

Observations (or Sum Wgts) 

14 


▼ Parameter Estimates 

Term Estimate Std Error DFDen t Ratio Prob>|t| 

Intercept 14.854669 1.351066 4.967 10.99 0.0001* 

► Random Effect Predictions 

▼ REML Variance Component Estimates 


Random Effect 

Var Ratio 

Var Component 

Std Error 

95% Lower 

95% Upper 

Pet of Total 

row 

0 

0 

0 

0 

0 

0.000 

col 

0 

0 

0 

0 

0 

0.000 

row'col 

2.4071077 

9.2426655 

6.9924503 

3.1503281 

95.737707 

70.650 

Residual 

Total 

-2 LogLikelihood 

3.8397391 

13.082405 

= 66.382015466 

1.9227483 

1.7502033 

14.128843 

29.350 

100.000 


FIGURE 19.6 Results from JMP using REML for Example 19.3. 
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S Fit Model 

- □ X | 

▼|' r Model Specification 




Dirt D|r. 

. . . , , 

i ICK ixQIG 

v anaoies 

QD 

Ay 

options! 

[ Weight | 

| optionsi Numeric 

1 Freq | 

options! Numeric 

l By | 

\ optional 


Personality: standard Least Squares v 
Emphasis: Minimal Report v 

Method REML (Recommended) v 

0 Unbounded Variance Components 
n Estimate Only Variance Components 


I Help | 

|Remove] 


Run Model 


Construct Model Effects 


Add 


Cross 


rows Random 


col& Random 
| row*col& Random 


Nest 


[Maci 


Degree Q 
Attributes r 1 
Transform r 
f~~l No Intercept 


FIGURE 19.7 JMP fit model screen with unbounded variance components selected for Example 19.3. 



▼ Summary of Fit 


RSquare 

0.759536 

RSquare Adj 

0.759536 

Root Mean Square Error 

1.954359 

Mean of Response 

14.64286 

Observations (or Sum Wgts) 

14 

Parameter Estimates 


Term Estimate Std Error DFDen t Ratio Prob>|t| 


Intercept 14.820768 0.238456 1 62.15 0.0102* 


► Random Effect Predictions 
▼ REML Variance Component Estimates 

Random Effect Var Ratio Var Component Std Error 95% Lower 95% Upper Pet of Total 


row 

col 

row*col 

Residual 

Total 


-0.163391 

-0.624074 




-5.732 

-1.885275 

-7.200847 

6.9944529 

-20.90997 

6.5082807 

-66.135 

3.8993012 

14.893461 

13.702899 

-11.96422 

41.751143 

136.787 


3.8195206 

10.888061 

1.9434412 

1.7235085 

14.451341 

35.080 

100.000 


-2 LogLikelihood = 64 242433164 


FIGURE 19.8 Unbounded results from REML of JMP for Example 19.3. 
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19.6 Concluding Remarks 

This chapter presented four methods, the method of moments, the maximum likelihood 
method, the residual maximum likelihood, and the MIVQUE method, for obtaining 
estimates of the variance components of a random effects model. Two examples, a one-way 
random effects model and a two-way random effects model, were used to demonstrate 
each of the methods of estimation. When the data are balanced and the solution provides 
positive values, the estimates from the REML, MIVQUEO and method of moments are 
identical. When the data sets are unbalanced, then each method will produce different 
estimates. SAS-MIXED and JMP were used to carry out the computations. 


19.7 Exercises 

19.1 The data in the following table are prices of coffee from five randomly selected 
states from the United States where four, five, or six cities were randomly selected 
per state and four to six stores per city were randomly selected. The price of 
brand x coffee was determined at each store. Write out a model to describe the 
data with states, cities nested within states and stores nested within cities and 
states. Provide REML, ML, MIVQUEO, and method of moments estimates of the 
variance components for state, city and store using type I, type II, and type III 
sums of squares. Also obtain the estimate of the mean price of brand x coffee for 
the United States. 


Coffee Price Data for Exercise 19.1 


State 

City 

Store 1 

Store 2 

Store 3 

Store 4 

Store 5 

Store 6 

1 

1 

2.78 

2.78 

2.80 

— 

— 

— 

1 

2 

2.93 

2.89 

2.91 

2.90 

— 

— 

1 

3 

2.73 

2.70 

2.74 

2.74 

2.72 

2.74 

1 

4 

2.93 

2.92 

2.93 

2.89 

2.93 

2.92 

1 

5 

2.94 

2.93 

2.94 

— 

— 

— 

2 

1 

2.20 

2.20 

2.16 

2.18 

— 

— 

2 

2 

2.11 

2.08 

2.11 

2.09 

— 

— 

2 

3 

2.02 

2.03 

2.06 

— 

— 

— 

2 

4 

1.98 

1.96 

2.02 

1.98 

1.99 

— 

2 

5 

2.05 

2.03 

2.04 

2.09 

2.02 

— 

2 

6 

2.10 

2.12 

2.09 

2.09 

— 

— 

3 

1 

2.18 

2.20 

2.22 

2.22 

— 

— 

3 

2 

1.97 

2.00 

1.98 

1.99 

1.97 

— 

3 

3 

2.11 

2.12 

2.13 

2.13 

2.10 

— 

3 

4 

2.10 

2.08 

2.07 

2.08 

— 

— 

4 

3 

2.35 

2.37 

2.40 

2.36 

2.37 

2.40 

4 

4 

2.34 

2.41 

2.33 

2.32 

2.32 

— 


Continued 
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Coffee Price Data for Exercise 19.1 


State 

City 

Store 1 

Store 2 

Store 3 

Store 4 

Store 5 

Store 6 

4 

3 

2.35 

2.37 

2.40 

2.36 

2.37 

2.40 

4 

4 

2.34 

2.41 

2.33 

2.32 

2.32 

— 

4 

5 

2.34 

2.33 

2.38 

2.34 

— 

— 

5 

1 

2.24 

2.21 

2.24 

2.19 

— 

— 

5 

2 

2.23 

2.17 

2.18 

2.18 

2.17 

— 

5 

3 

2.22 

2.21 

2.21 

2.20 

2.21 

— 

5 

4 

2.24 

2.27 

2.26 

— 

— 



19.2 The data in the following table are from a two-way random effects treatment 
structure. Write out an appropriate model and provide ML, REML, MIVQUEO, 
and method of moments estimates of the variance components for rows, columns, 
row x column interaction and residual using type I, type II, and type III sums 
of squares. 


Data for Exercise 19.2 


Row Treatment 


Column Treatment 


t 1 

2 

3 

4 

1 

29 

— 

30 

29 

31 

— 

2 

22 

— 

29 

17 

16 

3 

34 

28 

_ 

25 


— 

26 

26 

— 

24 

4 


26 

30 


19 


— 

30 

29 

— 

19 

5 

22 

30 

22 

19 

24 


22 

— 

20 

21 


19 

— 

19 

21 





Methods for Making Inferences about 
Variance Components 


When a researcher designs an experiment involving factors that are random effects, she 
often wishes to make inferences about specific variance components specified in the model. 
In particular, if al is the variance component corresponding the distribution of the levels 
of factor A, the experimenter may wish to determine if there is enough evidence to con¬ 
clude that al > 0. An appropriate decision can be made by 1) testing the hypothesis 
H q \ al = 0 vs H a : al > 0; 2) by constructing a confidence interval about al; or 3) by construct¬ 
ing a lower confidence limit for al . This chapter addresses these kinds of inference proce¬ 
dures for random effects models where methods for hypotheses testing are described in 
Section 20.1 and the construction of confidence intervals (lower bounds) is described in 
Section 20.2. The construction of confidence intervals for variance components has been a 
fertile area of research and many authors have developed specialized confidence intervals 
for specific functions of the variance component parameters. The methods described in 
this chapter are available in current software and some are available for specific problems. 
The discussion is not exhaustive, but rather points to the types of confidence intervals that 
have been addressed. A more complete discussion is available in Burdick and Graybill 
(1992) as well as papers in the current statistical journals. 


20.1 Testing Hypotheses 

There are two basic techniques for testing hypotheses about variance components. The first 
technique uses sums of squares from the analysis of variance table to construct F-statistics. 
For most balanced models, the F-statistics are distributed exactly as F-distributions, whereas 
for unbalanced models the distributions are approximated by F-distributions with the 
approximations becoming poorer as the designs become more unbalanced. The second 
technique is based on a likelihood-ratio test which is asymptotically distributed as a chi- 
square distribution. For balanced designs, the F-statistic approach is probably better than 
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the likelihood-ratio test, while for unbalanced designs there is no clear-cut choice. The 
reader may wish to carry out a simulation experiment to study the distributions of test 
statistics using a data structure similar to the data set of interest before making a decision 
as to which method to use to test the hypothesis of interest. 

20.1.1 Using the Analysis of Variance Table 

If the data set is balanced, then sums of squares obtained by the usual analysis of variance 
are independently distributed as scalar multiples of chi-square random variables. Let Q 
denote a sum of squares based on v degrees of freedom where its expected mean square is 
a function of four variance components. That is, suppose 

E(Q/v) = + Jqof + k 2 ol + k 3 a\ 

Then, assuming the data follow a normal distribution. 


o~ £ + /qoY + k 2 a'I + k 3 al 

is often distributed as a chi-square random variable with v degrees of freedom. For many 
hypotheses of the form H 0 : a\ = 0 vs H a : o] > 0, there are two independent sums of 
squares, denoted by Q, and Q 2 based on v t and v 2 degrees of freedom, respectively, with 
expectations 


E(Qi/Vj) = ct e 2 + k x c\ + k 2 a\ + k 3 a\ 


and 


EiQl/Vl) = + k 3°l 


The hypothesis H 0 : a\ = 0 vs H a : O', > 0 is equivalent to 

HjEiQM = E(Q 2 /v 2 ) vs H a : EiQ./v,) > E(Q 2 /v 2 ) 


The statistic used to test this hypothesis is F = (Q 1 /v 1 )/(Q 2 /v 2 ) which, under the conditions of 
H q , is often distributed as a central F-distribution with v t and v 2 degrees of freedom. The 
hypothesis is rejected for large values of F. This process involves obtaining sums of squares 
and then using their expected mean squares to determine the appropriate divisor for each 
hypothesis of interest. The following two examples demonstrate this procedure. 


20.1.2 Example 20.1: Two-Way Random Effects TS in a CR DS 

A model for the two-way treatment structure with both factors random in a completely 
randomized design structure is 

y ijk = p + a t + b j + c {j + £ ijk for i = 1,2,...,«, j = 1,2,...,&, and k = 1,2 ,...,n 
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TABLE 20.1 

Analysis of Variance Table for the Two-Way Random Effects Model of Example 20.1 


Source of Variation 

df 

SS 

EMS 

A 


a - 1 

nb£(y.-y...) 2 

i=l 

of + nof + nbof 

B 


b- 1 

b 

”«X(y./.-y...) 2 

7=1 

of + nof + naof 

AxB 


(«-!)(&-!) 

a b 

n 'L'Z,(yv-- Vi-- - y-t- + y-f 

i= i 7=1 

of + nof 

Residual 


ab(n - 1) 

i=l ;=1 fc=l 


Hypothesis 




Test Statistic 

Ho- a a = 0 vs H :I : 

o a 2 >0 



F = MSA/MSAB 

H a : of = 0 vs H a : 

°1>o 



F = MSB/MSAB 

H 0 : of = 0 vs H a : 

of >0 



F = MSAB/MSResidual 


where the a, - i.i.d. N( 0, of), - i.i.d. N( 0, of), c {j ~ i.i.d. N( 0, dji), e ijk ~ i.i.d. N( 0, c^), and the 
random variables a„ b jr c ,y, and £ ljk , are independently distributed. 

The analysis of variance table with sums of squares and expected mean squares for the 
model is shown in Table 20.1. Test statistics are constructed by examining the expected 
mean squares to select the proper numerators and denominators. The statistic used to test 
the hypothesis H 0 : of- 0 vs H a : a] >0 is constructed by setting of = 0 in the expected 
mean square for MSA. Next find another mean square that has the same expected mean 
square as does MSA when H 0 is true and use that mean square as the divisor. To test 
H 0 : of = 0 vs H a : of> 0 the appropriate divisor is MSAB, to test H a : of = 0 vs H a : of>0 the 
appropriate divisor is MSAB, and to test H 0 : of = 0 vs H a : of > 0 the appropriate divisor is 
MSResidual. A decision rule is to reject H 0 : of = 0 vs H a : of> 0 if F = MSA/MSAB> 
F«,(a-i),(«-i)(fc-i) where a is the selected type I error rate. Test statistics can be determined simi¬ 
larly for of and of. Table 20.1 contains a list of the hypotheses and the corresponding test 
statistics. Most likely, when the F-statistic does not exceed the specified percentage point, the 
conclusion is not that the variance component is zero, but rather, the magnitude of the vari¬ 
ance component is negligible compared with the other sources of variation in the system. 


20.1.3 Example 20.2: Complex Three-Way Random Effects TS 

The data in Table 20.2 are from a design where the levels of the three factors in the treatment 
structure are random effects, the levels of A are crossed with the levels of B, and the levels 
of C are nested within the levels of B, all in a completely randomized design structure. A 
general model to describe a larger data set with a structure similar to that in Table 20.2 is 

Vijhn = F + «, + bj + ( ab) ij + c k(j) + (ac) iHj) + e ijkm 

for i = 1,2,...,a, j = l,2,...,fr, k = l,2,...,c, and m = 1,2,...,« 
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TABLE 20.2 

Data for Example 20.2 

Factor A 


A 2 


Factor B 

Bi 

Q 

20 

20 

25 

26 



C 2 

23 

22 

26 

27 


b 2 

C 3 

36 

34 

38 

36 



Q 

39 

38 

40 

39 


The parameter p denotes an overall mean, a, denotes the effect of level i of factor A, fa¬ 
de notes the effect of level j of factor B, (ab) :j denotes the interaction between the levels of 
factor A and factor B, c k(n denotes the effect of level k of factor C nested within the /1 h level 
of factor B, (ac) ikljj denotes the interaction between the levels of factor A and factor C nested 
within the levels of B, and e l/km denotes the experimental unit or sampling error. Under 
ideal conditions, a,- ~ i.i.d. N( 0, bj~ i.i.d. N( 0, ofy, (afa), ( - ~ i.i.d. N( 0, crj,), (ac) lk(j) ~ i.i.d. 
N (0, <J 2 ac<b ), and e ijkm ~ i.i.d. N( 0, of). Furthermore, a ir b y ( ab) ijr c k(j) , (ac), k{jy and e ijkm have inde¬ 
pendent distributions. The analysis of variance table with expected mean squares for the 
general situation is shown in Table 20.3. F-statistics can be constructed to test hypotheses 
about each of the variance components by examining the expected mean squares. The 
statistics used to test the following hypotheses are: 

1) To test H 0 : <7 2 = 0 vs H a : a 2 „ > 0 is F a = MSA/MSAB. 

2) To test H 0 : = 0 vs H a : a 2 , > 0 is F ab = MSAB/MSAC(B). 

3) To test H 0 : a 2 b) = 0 vs H a : a 2 b) > 0 is F c(hj = MSC(B)/MSAC(B). 

4) To test H 0 : a 2 c(b) = 0vs H a : <J 2 ac(b) > 0 is F ac(h) = MSAC(B)/MSResidual. 


TABLE 20.3 


Analysis of Variance Table for Example 20.2 


Source of 
Variation 

df 

SS 

EMS 

A 

a - 1 

nbc 'Z(¥i--y-f 

i= 1 

+ no 2 ^ b) + ncal b + nbco 

B 

b — 1 

b 

nacY,(y. h - y...f 

i =i 

+ ,1(j2 a4b) + na °4l,) + nC °l + naC °l 

AB 

(«-!)(&-!) 

a b 

- y,... - y.j. + y....f 

i =i ;=i 

,) + nca »b 

C(B) 

b(c - 1) 

naYft,ly.p - y.jf 

j= 1 k= 1 

+ noU t) + na °lb) 

AC(B) 

(a - l)(c - 1 )b 

a b c 

"L'L'L&ik- ~ va- ~ y-i + y-i-i 1 

i=l j=l k= 1 

°l + na lm 

Residual 

(n -1 )abc 

'L'Z'Z'Liy^-ynk.) 2 

i= 1 ;'=1 k= 1 m=l 
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However, there is no F-statistic to test H 0 : of = 0 vs H a : of > 0 since none of the mean squares 
not involving of have the expected value ol + no 2 ac(b) + nao 2 c{h) + nco 2 h , which is the expected 
value of MSB when ol = 0. But there is a linear combination of mean squares (not including 
MSB) that has the desired expectation, that is, E[MSC(B) + MSAB - MSAC(B)] = <7 2 + 
no 2 ac( b) + nao 2 c(b) + ncof. Let Q = MSC(B) + MSAB - MSAC(B), then the statistic to test 
H 0 : ol = 0 vs H a : ol >0 is F h = MSB/Q. The sampling distribution of F h can be approximated 
with an F-distribution with b— 1 and r degrees of freedom. The denominator degrees of 
freedom, r, are determined by approximating the distribution of rQ/E(Q) by a chi-square 
distribution using the Satterthwaite (1946) approximation discussed in Chapter 2. 
The Satterthwaite approximation is used to approximate the sampling distribution of 
Q = q 1 MS 1 + q 2 MS 2 + ■ ■ ■ + q k MS k where MS, denotes a mean square based on f degrees of 
freedom, the mean squares are independently distributed, and the q t are known constants. 
Then rQ/E(Q) is approximately distributed as a central chi-square random variable based 
on r degrees of freedom where 


<=:i fi 

Assume It is a mean square based on/degrees of freedom that is independently distrib¬ 
uted of MS v MS 2 , ...,MS k with expectation E(U) = E(Q) + kgof The statistic to test H 0 : o/= 0 
vs H a : ol > 0 is F = U/Q, which is approximately distributed as an F-distribution with/and 
r degrees of freedom. 

The statistic to test H 0 : ol = 0 vs H a : ol > 0 is F b = MSB/Q, which is approximately distributed 
as an F-distribution with b — 1 and r degrees of freedom where 

r = _(Ql!_ 

[MSC(B)] 2 [MSAB] 2 [MSAC(B)] 2 

£>(c-l) (a-T)(b-l) b(a-l)(c-l) 

Table 20.4 contains the analysis of variance table for the data in Table 20.2, which includes 
the divisors (denoted by error terms) and the F-statistics for testing the respective hypo¬ 
theses. To test H 0 : o'l = 0 vs H a : c 2 > 0, let 

Q = MSC(B) + MSAB - MSAC(B) 

= 12.0625 + 10.5625 - 0.8125 = 21.8125 


The degrees of freedom corresponding to Q are 


r = 


(21.8125) 2 


(12.0625)72 + (10.5625) /l + (0.8125)72 


475.7852 

184.9785 


= 2.57 


The test statistic is F = 770.0625/21.8125 = 35.304 is based on 1 and 2.57 degrees of freedom. 
The significance level of the test is 0.0144 indicating that there is evidence to believe that 
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TABLE 20.4 


Analysis of Variance Table Using Type I Sums of Squares with Proc Mixed Code for 
the Data from Example 20.2 


PROC MIXED data=EX_20 METHOD=TYPE1 COVTEST CL; CLASS 
TITLE2 'METHOD=TYPEl'; 

MODEL Y=; 

RANDOM A B A*B C(B) A*C(B); 

ABC; 



Source 

df 

SS 

MS 

Error Term 

Error df 

F-Value 

Pr>F 

A 

1 

39.0625 

39.0625 

MS(AxB) 

1 

3.70 

0.3053 

B 

1 

770.0625 

770.0625 

MS (A xB) + MS[C(B )] 
-MS[AxC(B)| 

2.5767 

35.30 

0.0144 

AxB 

1 

10.5625 

10.5625 

MS[A x C(B)] 

2 

13.00 

0.0691 

C(B) 

2 

24.1250 

12.0625 

MS[A x C(B)] 

2 

14.85 

0.0631 

A x C(B) 

2 

1.6250 

0.8125 

MS( Residual) 

8 

1.00 

0.4096 

Residual 

8 

6.5000 

0.8125 

— 

— 

— 

— 


of > 0, or that the variation due to the population of levels of factor B is an important part 
of the total variation in the system. 

To test hypotheses about variance components in balanced designs, the F-test constructed 
from a ratio of two mean squares should be used whenever possible. When the ratio of two 
mean squares cannot be used, the Satterthwaite approximation is an acceptable alternative. 

Some sort of Satterthwaite approximation is almost always necessary to test hypotheses 
about the variance components when the design is not balanced. Additionally, the sums of 
squares in the analysis of variance table may not have independent distributions, although 
sets of sums of squares may be independent for some special cases. The residual or error 
sum of squares is always independent of the other sums of squares in the analysis of 
variance table. Thus, for any mean square U with expectation a\ + k 0 ( Tq, the statistic F 0 = 
U/MSResidual provides a test of the hypothesis H 0 : a 2 0 = 0 vs H a \ > 0. Under the condi¬ 
tions of H 0 , F is distributed as a central F-distribution with u and v degrees of freedom 
where u are the degrees of freedom associated with LI and v are the degrees of freedom 
associated with MSResidual. 

Mean squares whose expectations involve more than two variance components gener¬ 
ally cannot be used to obtain test statistics about single variance components that have 
exact F sampling distributions. The exact F-distributions occur for some balanced designs, 
as demonstrated in the previous two examples. One reason the ratios are not exactly dis¬ 
tributed as F is that the respective mean squares are not independently distributed. If the 
design is not too unbalanced, then using the F-distribution as an approximation should be 
adequate. Additionally, the sums of squares (other than the residual) are not distributed as 
scalar multiples of chi-square distributions when the design is unbalanced. 

In general, to test H 0 : a 2 — 0 vs H a : a\ > 0, there will be one mean square, denoted by U u 
with expectation 


£(Ui) = a; + k u ol + k lb a; + k lc a 2 

but there will be no other mean square that has expectation a\ + k u o 2 b + k lc cr 2 c ; that is, 
there is no single mean square that is the appropriate divisor. The method is to find a lin¬ 
ear combination of other mean squares, say Q = t/.MS, where E(Q) = + k u a\ + k u a 2 . 
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TABLE 20.5 

Analysis of Variance Table with Expected Mean Squares and F-Statistic for the Data in Example 19.2 

proc mixed method=type3 data=exl9_l covtest cl; 

class variety; 

model damage=/solution; 

random variety; 

Source df SS MS EMS Error Term Error df F- Value Pr > F 

Variety 3 0.810160 0.270053 Var (Residual) + 3.1795 MS(Residual) 9 4.79 0.0293 

Var(Variety) 

Residual 9 0.507917 0.056435 Var(Residual) — — - — 


The Satterthwaite approximation can be used to approximate the sampling distribution of 
Q, that is, to find r such that rQ/E(Q) is approximately distributed as a chi-square random 
variable with r degrees of freedom. The approximation is twofold, since 1) the degrees of 
freedom are approximated and 2) the mean squares making up Q are not necessarily inde¬ 
pendently distributed as chi-square random variables as required by the approximation. 

The SAS®-Mixed code and the resulting analysis of variance table using type III sums of 
squares for the wheat insect damage data of Example 19.2 are displayed in Table 20.5. The 
expected value of the variety mean square is <J 2 +3.1795a 2 ai . To test the hypothesis 
H 0 : oy nr = 0 vs H a : crj ar > 0, the appropriate divisor is the residual mean square which pro¬ 
vides an F-statistic of 4.79. The computed F-statistic is compared to an F-distribution with 
3 and 9 degrees of freedom; it has a significance level of 0.0293. Since this is a one-way 
experiment, the type I analysis would have been identical to this type III analysis. 

The SAS-Mixed code and analysis of variance table constructed from the type I sums 
of squares for the two-way random effects data of Example 19.3 are displayed in Table 20.6. 
To test the hypothesis H 0 : (7; nvxcol = 0 vs H a : (7; owxcol > 0, the appropriate divisor is the 
residual mean square which provides an F-statistic of 14.24. The computed F-statistic is 
compared with an F-distribution with 2 and 8 degrees of freedom, providing a significance 
level of 0.0023. There are no exact tests available for testing H 0 : (J 2 nm , = 0 vs H a : cr olv > 0 and 
H 0 : (J 2 nl = 0 vs H a : <j 2 o] > 0, thus approximate tests need be constructed. The appropriate 
divisor for MSRoiv used to test H 0 : cr 2 ow = 0 vs H a : er ow > 0 is calculated as 

01429 1 T 01429 1 

Q row = -- -MSCol + - 2.4286 - --- x 2.3126 MSRowxCol 

row 4.5714 2.2588 L 4.5714 J 

“ 0.1429 1 0.1429 , 

+ 1 - 2.4286 -x 2.3126 MSResidual 

4.5714 2.2588 v 4.5714 )_ 

= 0.0313 x MSCol + lM32MSRow x Col - 0.0744MSResidual 

= 55.7806 


The Satterthwaite approximate degrees of freedom associated with Q row are computed as 


dfn = 


(Q f 

_ vx-row / _ 

(0.0313 x MSCol) 2 (1.0432 x MSRoiv x Col) 2 (0.0744 x MSResidual) 2 
2 + 2 + 8 


= 1.9961 
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TABLE 20.6 


Type I Analysis of Variance with Expected Mean Squares and Error Terms Used in Computing the 
T-Statistics for the Two-Way Random Effects Data of Example 19.3 


proc mixed data=ex 
class row col; 
model y=; 
random row*col; 

19 2 method=typel ic cl covtest asycov; 



Source 

df 

SS 

MS 

EMS 

Error Term 

Error df F- Value 

Pr>F 

Row 

1 

0.642857 

0.642857 

Var(Residual) + 

2.4286 Var(row x col) + 
0.1429 Var(col) + 

7 Var(row) 

0.0313 MS(col) + 
1.0432 MS(row x 
col) - 0.0744 
MS(Residual) 

1.9961 0.01 

0.9251 

Column 

2 

14.759664 

7.379832 

Var(Residual) + 

2.3126 Var(row x col) + 
4.5714 Var(col) 

1.0238 MS(row x 
col)- 0.0238 
MS(Residual) 

1.9935 0.13 

0.8832 

Row x 
column 

2 

109.145098 

54.572549 

Var(Residual) + 

2.2588 Var(row x col) 

MS(Residual) 

8 14.24 

0.0023 

Residual 

8 

30.666667 

3.833333 

Var(Residual) 

— 

— — 

— 


The resulting F-statistic is F row = MSRow/Q row = 0.0113 with a significance level 0.9251. 
The appropriate divisor for MSCol used to test H 0 : a 2 col = 0 vs H a : cr 2 col > 0 is calculated as 


Qcol 


2.3126 . _ , 

- MSRoiv x Col + 

2.2588 


2.3126 

2.2588 


MSResidual 


= 1.0238MS.Rozc x Col - 0.0238MSResidual 


= 56.8729 


The Satterthwaite approximate degrees of freedom associated with Q col are computed as 


rf /ec„. 


_(OJ!_ 

(1.0238 x MSRoiv x Col) 2 (0.0238 x MSResidual) 2 
2 + 8 


1.9935 


The resulting F-statistic is F col = MSCol/Q col = 0.1323 with significance level 0.8832. 


20.1.4 Likelihood Ratio Test 

The second method for testing hypotheses about variance components is based on a likeli¬ 
hood ratio procedure which involves evaluating the value of the likelihood function for the 
complete model and evaluating the value of the likelihood function for the model under the 
conditions of H 0 . 

The general random model of Equation 18.3 is 


y-f„M + Z|M, + Z 2 u 2 + ••• +Z k u k + E 



Methods for Making Inferences about Variance Components 


345 


where u : ~ N( 0, ofI,f m 2 - N(0, o\I t f,..^u r ~ N(0, c rflf, £ ~ N( 0, ct 2 I n ) and the random vari¬ 
ables are all independently distributed. The distributional assumptions imply that the 
marginal distribution of y is N(j n p, X) where X = ofI„ + o\ZfZ[ + o\ZfZ\ + ■ ■ • + o\Z k Z' k . The 
likelihood function is 


L(p,of,of,of...,of\y) = (2rc) "^fSp^exp 


- 1/2 





The likelihood function subject to the conditions of H 0 : o\ - 0 is 


L 0 {p,of,Q,of...,of y) = (In) 


-n/2 \ 


1 - 1/2 


exp 


1 

2 


(y-i„yy^o(y 



where X 0 = of! n + offZfZf + o\ZfZ' 2 + ■ ■ ■ + a\Z k Z' k . 

The process is to obtain maximum likelihood estimators for the parameters of both like¬ 
lihood functions and evaluate each likelihood function at the values of its estimators. The 
likelihood ratio test statistic is 


LR{ol = 0) 


£o(AoAo/0,<7 2 o,<7 30 ,. ..,O k0 y ) 
L(iy) 


where & 2 0 denotes the maximum likelihood estimate of erf from the likelihood function 
under the conditions of H 0 : a\ = 0. When H 0 : a] = 0 is true, the asymptotic sampling distri¬ 
bution of 


-2 log [LR( o\ = 0)] = -2 log e [L 0 (fi Q , d; 0 ,0, erf, of,..., of j i/)] 

+ 2 log e [L(p, of, of, of..., of 1 1 /)] 

is central chi-square with 1 degree of freedom. The reason there is one degree of freedom 
is that there is one less parameter in L 0 (») than in L(»). The decision rule is to reject H 0 if 
-2 log[LR(of = 0)] > xfv Likelihood ratio test statistics can be computed using SAS-Mixed 
where METHOD = ML is used as the variance component estimation procedure. 


20.1.5 Example 20.3: Wheat Varieties—One-Way Random Effects Model 

The SAS-Mixed code with results for obtaining maximum likelihood estimates of the 
parameters for the full model of Example 19.2 are given in Table 20.7. The maximum likeli¬ 
hood estimates of the parameters for the model describing the data in Example 19.2 are 
p = 3.9909, of = 0.05749, and d; ar = 0.04855 and the value of -21og e ( p,of,of.Jy) is 4.96762. 
The reduced model is fit to the data using the SAS-Mixed code in Table 20.8 where the 
"Random Variety;" statement was excluded from the model in Table 20.7. Under the condi¬ 
tions of H 0 : of T = 0, the maximum likelihood estimates of the parameters are p l} = 4.0269, 
of = 0.1014, and df ar0 = 0. The value of -21og e (/i 0 , of, 0 | y) is 7.13832. The value of -21og e of 
the likelihood ratio test for testing H 0 is 7.13832 - 4.96762 = 2.1707. The value 2.171 is 
compared with percentage points of a central chi-square distribution with one degree of 
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TABLE 20.7 

Proc Mixed Code and Results for Fitting the Full Model for Example 19.2 Using Method = ML to 
Evaluate Likelihood Function 


Proc mixed method=ml data=exl9 
class variety; 
model damage=/solution; 
random variety; 

1 covtest cl ic; 





Covariance Parameter Estimates 

Covariance Parameter Estimate 

Standard Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Variety 

0.04855 

0.05075 

0.96 

0.1693 

0.05 

0.01267 

2.5700 

Residual 

0.05749 

0.02760 

2.08 

0.0186 

0.05 

0.02690 

0.1972 

Solution for Fixed Effects 








Effect 

Estimate 

Standard Error 

4/ 

f-Value 

Pr> |f| 

Intercept 

3.9909 

0.1297 


3 

30.78 

<0.0001 

-2 loglike 

4.967616 








TABLE 20.8 

Proc Mixed Code and Results for Fitting the Reduced Model for Example 19.2 Using Method = ML 
to Evaluate Likelihood Function when <7^ ar = 0 

Proc mixed method=ml data=exl9_l covtest cl ic; 

class variety; 

model damage=/solution; 


Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Standard Error Z-Value 

PrZ 

a Lower 

Upper 

Residual 

0.1014 

0.03977 2.55 

0.0054 

0.05 0.05329 

0.2632 

Solution for Fixed Effects 






Effect 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

Intercept 

4.0269 

0.08831 

12 

45.60 

<0.0001 

-2 loglike 

7.138320 






freedom. The significance level for this test is 0.1407. The F-test constructed using the 
expected mean squares in Table 20.5 provided a significance level of 0.0293. The sampling 
distribution of the likelihood ratio test is an asymptotic distribution that is acceptable for 
large sample sizes whereas, in this example, the sample size is small. The sampling distri¬ 
bution for the F-test given previously is exact. F-tests for other variance components with 
models involving more than two variance components are approximate small sample size 
tests and are generally quite adequate and are likely better for small sample size cases than 
tests based on asymptotic distributions. 
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TABLE 20.9 

Maximum Likelihood Estimates of the Variance Components for Various Models So That Likelihood 
Ratio Test Statistics Can Be Computed to Test the Hypothesis That Each Individual Variance 
Component Is Zero 

Proc Mixed data=exl9_3 Proc Mixed data=exl9_3 

me t hod=ML; me thod=ML; 


title2 "Using 
class row col; 
model y=/solut 
random row col 

Maximum Likelihood 

ion; 

row*col; 

" ; title2 "Using Maximum Likelihood 
class row col; 
model y=/solution; 
random row col; 

without row*col", 

Covariance 

Parameter 

Full Model Estimate ] 

Model without 

Row x Column Estimate 

Model without 

Row Estimate 

Model without 
Column Estimate 

Row 

0 

0 

0 

0 

Column 

0 

0 

0 

0 

Row x column 

7.4065 

0 

7.4065 

7.4065 

Residual 

3.8430 

0 

3.8430 

3.8430 

Intercept 

14.8476 

11.0867 

14.8476 

14.8476 

-2 loglike 

68.7 

73.4 

68.7 

68.7 


20.1.6 Example 20.4: Unbalanced Two-Way 

The likelihood ratio statistic to test the hypothesis H 0 : (7^ owxcol = 0 vs H a : <J 2 Iowxcol = 0 > 0 for 
the two-way random effects data of Example 19.3 is obtained by fitting a model with all 
terms in the random statement and then fitting the model without the row x col term in the 
random statement, as displayed in Table 20.9. The SAS-Mixed code for fitting the two mod¬ 
els is included in Table 20.9. The results from fitting models without row and without col¬ 
umn effects are also included without the corresponding SAS-Mixed code. Maximum 
likelihood estimates of the variance components, estimates of the intercepts, and the 
-2 log(likelihood) values are given. The likelihood ratio test statistic for the hypothesis H 0 : 
<7rowxcoi = 0 vs H a : (7 j OWXCol > 0 is 73.4 - 68.7 = 4.7, which has a chi-square with one degree of 
freedom sampling distribution under the conditions of H 0 . The significance level is 0.030, 
indicating there is sufficient information to believe that < 7 ^ owxcol > 0. The values of the likeli¬ 
hood ratio statistics are both zero for testing the hypotheses that row and the column vari¬ 
ance components are equal to zero. This occurs because the maximum likelihood estimates 
of both variance components are zero. So for testing the row variance component is zero, 
the value of the -21og(likelihood) remains the same whether row is included in the 
random statement or not. 


20.2 Constructing Confidence Intervals 

There are a few procedures that provide exact confidence intervals about some of the 
variance components in some models, but most of the time confidence intervals that are 
obtained are approximate and rely on some type approximation. 
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20.2.1 Residual Variance aj 

For the general random model, a (1 - a)100% confidence interval about cr 2 is 


%(a/2),\ 


< a: < 


va: 


Xi- 


■(a/2),v 


where a 2 = SSRESIDUAL/v, v is the degrees of freedom associated with 6; and Xi-(a/ 2 ),v 
and x 2 an,v denote lower and upper a/2 percentage points from a chi-square distribution 
with v degrees of freedom. 


20.2.2 General Satterthwaite Approximation 

The general Satterthwaite approximation to the sampling distribution associated with a 
variance component is obtained by equating the first two moments of ra}/E{df) to the first 
two moments of a chi-square distribution based on r degrees of freedom and then solving 
for r. The first moment of chi-square distribution is equal to its degrees of freedom, r. So 
equating the first moment of ra 2 /E{a 2 ) to the first moment of a chi-square distribution 
with r degrees of freedom provides no information about r. The variance of a chi-square 
distribution with r degrees of freedom is equal to 2 r. Equating the variances of raf/E(af) 
to the variance of a chi-square distribution with r degrees, one obtains 


Var 


ror 


2 A 


m) 


= 2 r or 




-Var(of) = 2 r 


which implies 


r = 2[E(df)] 2 
Var(of) 

The value of r is estimated by replacing £(df) by of and Var(6; 2 ) by an estimate of its 
variance. If the REML or ML solutions are obtained, then the inverse of the information 
matrix can be used to provide the estimates of the variances of the estimated variance 
components. SAS-Mixed uses the inverse of the information matrix evaluated at the values 
of the estimates of the variance components to provide asymptotic estimates of the vari¬ 
ances and covariances of the estimated variance components. 

Table 20.10 contains the SAS-Mixed code using options "covtest," "cl" and "asycov" to 
yield the estimated standard errors of the estimated variance components, Z-values 
computed as the ratio of each estimate to its estimated standard error, the Satterthwaite 
type confidence intervals denoted by Lower and Upper, and the asymptotic covariance 
matrix of the estimates of the variance components. The data step in Table 20.10 computes 
the approximate degrees of freedom associated with each estimated variance component 
and then uses those degrees of freedom to compute the Satterthwaite confidence intervals. 
The approximate degrees of freedom are 1.758 and 8.828 for cr 2 ar and cr 2 , respectively. The 
recomputed confidence intervals are identical to those provided by SAS-Mixed. 
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TABLE 20.10 

SAS-Mixed Code with Method = REML and a Data Step to Calculate Satterthwaite- 
Type Confidence Intervals for Variance Components for the Insect Damage in Wheat 
of Example 19.2 

Proc mixed method=reml data=exl9_l covtest cl asycov; 

class variety; 

model damage=/solution; 

random variety; 

ods output covparms=cov asycov=asycov; 


Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Standard Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Variety 

0.07316 

0.07802 

0.94 

0.1742 

0.05 

0.01875 

4.4672 

Residual 

0.05700 

0.02713 

2.10 

0.0178 

0.05 

0.02681 

0.1929 


Asymptotic Covariance Matrix of Estimates 

Row Covariance Parameter CovPl CovP2 

1 Variety 

2 Residual 

data covl; set cov; 
df=2*zvalue**2; 
chil_alpha=cinv(alpha/2,df); 
chialpha=cinv(1-alpha/2,df); 
low=df^estimate/chialpha; 
up=df*estimate/chil_alpha; 
proc print; 


Covariance Paramerter 

df 

Chil_Alpha 

Chialpha 

Low 

Up 

Variety 

1.75840 

0.02880 

6.8588 

0.018755 

4.46721 

Residual 

8.82768 

2.60870 

18.7684 

0.026811 

0.19289 


0.006087 -0.00031 

-0.00031 0.000736 


REML estimates of the variance components for the data of Example 20.1 are provided by 
the SAS-Mixed code in Table 20.11. The confidence intervals about the variance components 
are computed using df- 2(Z-value) 2 , which provides 0.50,0.95,0.84,1.73,0, and 8 degrees of 
freedom, respectively. One should notice that, when there are very few levels associated 
with a random effect and consequently very few degrees of freedom, the resulting confi¬ 
dence intervals are going to be extremely wide. 


20.2.3 Approximate Confidence Interval for a Function of the 
Variance Components 

Often a researcher wishes to construct a confidence interval about some function of the 
variance components in the model. As is described in the next section, there are some 
cases involving balanced models where exact confidence intervals exist. In this section, a 
first-order Taylor's series (Kendall and Stuart, 1973) is used to provide an estimate of the 
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TABLE 20.11 

SAS-Mixed Code to Provide REML Estimates of and Satterwaite Confidence Intervals about 
the Variance Components 


PROC MIXED data=EX_ 
CLASS ABC; 

MODEL Y=; 

RANDOM A B A*B C(B) 

2 0 METHOD: 

A*C(B); 

=REML COVTEST 

CL; 





Covariance Parameter Estimates 

Covariance Parameter Estimate 

Standard Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

A 

3.5625 

7.1533 

0.50 

0.3092 

0.05 

0.5170 

3774318 

B 

93.5313 

136.15 

0.69 

0.2461 

0.05 

18.1334 

141645 

AxB 

2.4375 

3.7399 

0.65 

0.2573 

0.05 

0.4504 

8130.42 

C(B) 

2.8125 

3.0225 

0.93 

0.1760 

0.05 

0.7162 

181.58 

A x C(B) 

0 

0.4542 

0.00 

0.5000 

0.05 

— 

— 

Residual 

0.8125 

0.4063 

2.00 

0.0228 

0.05 

0.3707 

2.9820 


variance of a function of variances which is then used to construct a Satterthwaite type 
confidence interval. Let a denote the vector of variance components and let Var(tr) denote 
the matrix of variances and covariances of the estimates of the variance components. 
Assume that the function of the variance components of interest is cp(c r) and that cp(a) has 
continuous first derivatives. The maximum likelihood estimate of cp(a) is (p(a), where a 
denotes the maximum likelihood estimate of a. An estimate of the inverse of the informa¬ 
tion matrix corresponding to the variance components is V(<7) which provides an estimate 
of Var(cr). Compute the derivatives of (p(a) with respect to each of the variance components 
and evaluate them at a. Then let 


/' 


dcp{a) dcp(a) 
v 3 of ' da; 


dcp(a) dcp(a ) N 
dal ' ^ ct ; , 


The approximate variance of (p{a) is af a) =f'V(a)f. The Satterthwaite approximate degrees 
of freedom corresponding to this estimate are computed as 


_ 2[cp(a)f 

' - 2 


Suppose the researcher is interested in estimating the total variability in the data for 
the insect damage study of Example 19.2; that is, the researcher wants to estimate <p(a) = 
of ar + a\. The vector of first derivatives of (p(a) is 
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and the covariance matrix of the estimates of the variance components is 

" "2 .5- 

A CT .2 G >2 -2 

V(a) = 


a 2 2 

( 7.2 

(Tyar 

CTyai 

(7 ~2 *2 

(77 

< 7 var °£ 

°e 

•12 

~ 2 


= CT ^ 


tion in Table 20.10 one obtains 

<p(d) = 0.07316 + 0.05700 = 0.13010 

and 

&l (a) = 0.006087 + 0.000736 - 2(0.00031) = 0.006203 

The approximate degrees of freedom corresponding to the estimated variance are 

r = 2(0.13010)* 

0.006203 

Finally, a 95% confidence interval about (p(o) = o\ 2 ar + <7 2 is 

0.052287 < C7 2 ar + cr 2 < 0.70253 

As a second example, the intraclass correlation coefficient is defined to be 


(7 

P= I ^2 

a; + cri. 


An approximate confidence interval can be constructed using the Taylor's series approach 
where 


< p (( 7 ) = 


^2 , —.2 

+ °Var 


The transpose of the estimated vector of derivatives of 


_2 . _2 

a e + ^var 


% = f + O - (O -(O 
/ /"2 . -2 \2 ' /"2 . 2.2 \2 
+ °Var) + °Var) 
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Using the information in Table 20.10, the estimate of the intraclass correlation is p = 0.5621 
and its estimated variance is 0.2156. The corresponding approximated degrees of freedom 
are r = 2.93. The resulting approximate confidence interval for the intraclass correlation 
coefficient is (0.1785,8.214) which can be simplified to (0.1785,1) since the intraclass correla¬ 
tion coefficient cannot be greater than 1. The first-order Taylor's series approach works 
quite well for linear functions of the variance components and less well for non 
linear functions such as the intraclass correlation. Better confidence intervals for the intra¬ 
class correlation are available and they will be described in Section 20.2.3. 

When one uses the sums of squares approach to obtain estimates of the variance 
components, the usual Satterthwaite approximation can be used to construct a confidence 
interval. Suppose the estimate of each variance component can be expressed as a linear 
combination of the mean squares given in an analysis of variance table as, of = Xf c, MS, + 
c es MSResidual. Then the approximate number of degrees of freedom is computed as 


r = 


(<u 2 ) 2 


^ (c g MS,) 2 [ (cJASResidimlf 

1=1 df MS dfMSResidual 


An approximate (1 - a)100% confidence interval about of is 


>'M 

Xa/2,1 


< a: < 


ro. 


Zi- 




The coefficients of the mean squares are quite easily obtained from the expected mean 
squares and the error terms provided by SAS-Mixed when a sum of squares method is used 
to estimate the variance components. For example, the estimate of of for Example 20.2 is 


d 2 = -[MSA - MS(AxB)]= -(39.0625 - 10.5625) = 3.5625 
8 8 


and the approximate number of degrees of freedom is 


(3.5625) 2 

[(l/8)39.0625] 2 [(1/8)10.5625] 2 

1 ' 1 


0.305 


The resulting confidence interval for of is (0.4226,27,556,199,203.09), a very wide interval 
caused by the very low number of degrees of freedom (0.305). Also, the estimate of of for 
Example 20.2 is 


d 2 = -{MSB - [MS(AxB) + MS{C{B)) - MS(AxC(B))]} 
8 

= ^(770.0625 - 12.0625 - 10.5625 + 0.8125) 

= 93.5313 
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and the approximate number of degrees of freedom is 

(93.5313) 2 

r ~ [(l/8)770.0625] z [(1/8)12.0625] 2 [(1/8)10.5625] 2 [(1/8)0.8125] 2 

1 + 2 1 2 

= 0.944 


The resulting confidence interval for o'i is (18.1347,141,493.19) which is also a wide interval 
caused by the small number of degrees of freedom. Basically, when only a small number 
of degrees of freedom is available to estimate a variance component, you know very little 
about that variance component. 


20.2.4 Wald-Type Confidence Intervals for Variance Components 

Wald-type confidence intervals can be computed using the asymptotic normality of maxi¬ 
mum likelihood estimates. The (1 - a)100% Wald confidence interval about a variance 
component, of is < o 2 s <6 2 s + Z a/2 fof, Wald confidence intervals are symmetric 

about of Under an assumption of normality of the data, the sampling distribution associ¬ 
ated with a variance is a chi-square distribution which is not symmetric. But as the number 
of degrees of freedom increases, the shape of the chi-square distribution becomes much 
more symmetric. Thus, when the degrees of freedom associated with a variance 
component is large, the Wald confidence interval should be adequate, but when the 
number of degrees of freedom is small, the Satterthwaite type confidence interval will 
more closely reflect reality. When one uses METHOD = TYPE* in SAS-Mixed and requests 
confidence intervals, Wald confidence intervals are provided for all variance compo¬ 
nents, except for the residual variance component where a Satterthwaite confidence inter¬ 
val is computed. 


20.2.5 Some Exact Confidence Intervals 

Since all sums of squares (not including SSRESIDUAL) in the analysis of variance table 
for a balanced model are independent of the residual sum of squares, the next result can 
be used to construct a confidence interval about a variance component, say of with a con¬ 
fidence coefficient of at least size 1 - a when there is a mean square in the model with 
expectation of + a of. Let Q 1 = MSRESIDUAL and suppose it is based on iq degrees of 
freedom, and let Q, be a mean square based on u 2 degrees of freedom with expectation 
of + aof A set of exact simultaneous (1 - a)100% confidence intervals about of and of + ao\ 
is given by 


»iQi 

X p/2,u, 
X p/2,u 2 


< o: 


< o: 


< “A 

Xl-(p/2) 

+ ao 2 < 


>« 1 

» 2 Qz 

X\-(p/2\u 2 


where p = 1 - V(1 - a). The intersection of these two regions on the (of of) plane provides 
a (1 - a)100% simultaneous confidence region for (of of). A graph of the confidence region 
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is shown in Figure 20.1. A (1 - a)100% confidence interval about a\ is obtained by deter¬ 
mining the maximum and minimum of a\ over the confidence region. The minimum 
value of a\ over the intersection region in Figure 20.1 is given by c and the maximum is 
given by d. The values of c and d can be determined by solving for the values of a\ where 
the respective lines intersect. The two sets of line intersect at 

_ M 2 Q 2 / Xp/2,u 2 ~ U 2 Ql/.Xl-(p/2),U2 

C / 

a 


and 


d = 


M 2 Q 2 / Xl-(p/2),u 2 “lQl/X 2 p /2,u 2 

a 


which provides c < a\ < d as the (1 - a)100% confidence interval about a\. The value of c 
corresponds to the value of a\ at which the two lines 

<T 2 = U ^/Xup/2),u, and = ~ a ° 2 + "iQJ X 2 p,2,u 2 

intersect, and the value of d corresponds to the value of of at which the two lines 

=u x QJ X 2 p/2,u x and ^ 2 = -^ + u 2 Q 2 /x^p /2)rU2 


intersect. 
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The above joint confidence region about (erf of) can be used to construct a confidence 
interval about any continuous function of cr^ and of Let <p(of of) denote a continuous 
function of o\ and o\ over the (of cry) space and let 3\. (/ (of cry) denote the joint confidence 
region for o\ and of The lower confidence limit for <p(of cry) is 


L= min [rp(cry, erf) J 


and the upper confidence limit for <p(of of is 


U = max [cp(o; r o;)] 

(ol ,o})e9t a (ol ,o I 2 ) 


For example, if 


Vfal'0 1 ) = 2^-2 

o £ + cr, 

the maximum occurs at (ii\Q\/x\-(a/ 2 ),iy c ) and the minimum occurs at hpQfXa/ 2 , 1 ,/ d). Thus 
using the data in Example 20.1, the resulting confidence interval about 


_2 . _2 
CT + CT, 


is 


-0.048511 < 2 Gl . < 1.89748 
o; + cr, 


or 


0< 


_2 . _2 
CT.. + CT, 


< 1 


since it is known that 


_2 . _2 
CT,. + CT, 


must be greater than 0 and less than 1. When the numbers of degrees of freedom are small 
the minimization and maximization method can lead to wild results, especially when 
cp(of of) is a nonlinear function. 
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Another confidence interval for a\, proposed by Williams (1962), has lower and upper 
confidence limits given by 


U 2[Ql Ql ^u/2,U2,u, 1 


a Xa/2 ,u 2 


and dj = 


! b[Q2 Ql ^"l-(«/2),M 2 


aXx- 


-, respectively 


(a/2),u 2 


and has a confidence coefficient of at least 1 - 2a. Simulation studies (Boardman, 1974; and 
an unpublished study by the authors) have shown that, if p is chosen to be a rather than 
p = i - V(i - «), as described above, the first procedure gives a confidence interval for a\ 
that has a confidence coefficients that is no less than 1 - a. Similar simulations have shown 
that the interval proposed by Williams has a confidence coefficient that is no less than 
1 - a. The interval of Williams is a little shorter than the one described by c < a\ < d. Thus 
the (c, d ) interval is a little more conservative than Williams's interval. 


20.2.6 Example 20.5: Balanced One-Way Random Effects Treatment Structure 

The data in Table 20.12 are from a one-way treatment structure where five workers were 
randomly selected from the population of workers working at a plant. The levels of workers 
are random effects and the study was conducted as a completely randomized design 
structure with five workers and three observations per worker where the response is the 
number of units assembled during a fixed period of time. A model that can be used to 
describe this data is 

1 lij = IL + + £, y , i = 1,2,...,5, j = 1,2,3, zv , ~ i.i.d. N( 0, cr;) and £, y ~ lid. N( 0, a;) 


An analysis of variance table including the expected mean squares is provided in 
Table 20.13. A 95% confidence interval about al is 


<J«- 

20.483 £ 3.247 


or 0.781 < a; < 4.928 


A 95% confidence interval about aj + 3of„ is 


128 2 „ 2 128 

-< cr; + 3 ai < - 

11.143 £ 0.484 


or 11.487 < a; + 3 a\, < 264.463 


TABLE 20.12 

Data for Example 20.5 


Station 

Worker 1 

Worker 2 

Worker 3 

Worker 4 

Worker 5 

1 

2 

6 

7 

3 

12 

2 

5 

6 

9 

5 

11 

3 

4 

8 

8 

6 

13 
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TABLE 20.13 

SAS-Mixed Code and Resulting Analysis of Variance Table for the Worker Data of Example 20.5 

PROC MIXED data=EX_20_5 METH0D=TYPE1 COVTEST CL; 

CLASS worker; 

MODEL num_units=; 

RANDOM worker; 


Type I Analysis of Variance 






Source 

df ss 

MS 

EMS 

Error Term 

Error df 

F-Value 

Pr>F 

Worker 

4 128.0 

32.0 

Var(Residual) + 3 Var(Worker) 

MS(Residual) 

10 

20.00 

<0.0001 

Residual 

10 16.00 

1.6 

Var(Residual) 

— 

— 

— 

— 


By using the joint confidence region resulting from the intersection of these two intervals, 
an approximate 95% confidence interval about is c < a\ < d where 


11.487- 4.928 „ 

c =-= 2.186 

3 


, , 264.463 - 0.781 0P7010 

and d = -= 87.818 

3 


Thus one obtains 2.186 < of < 87.818 as an approximate 95% confidence interval for of 
This type of procedure can be used to construct a confidence interval about any variance 
component, say a\, when there are two independent mean squares Q, and Q 2 based on u i 
and u 2 degrees of freedom, respectively, such that E(Q 2 ) = E(Q l ) + a\. Let o 2 0 = E(Q,) and 
replace o 2 by o\ in the previous development; then the (1 - «)100% approximate confi¬ 
dence interval for o\ is c < a\ < d. The result of Williams (1962) has also been applied to this 
case (Graybill, 1976, Theorem 15.3.5). 


20.2.7 Example 20.6 

To demonstrate the above more general procedure, a 90% confidence interval about o 2 is 
constructed for the data in Example 20.2 where the data are in Table 20.2, the expected mean 
squares are in Table 20.3, and the analysis of variance table for the data is in Table 20.4. 
Let Q 2 = MSA and Q, = MSAB, both of which are based on 1 degree of freedom. Their 
expectations are 

E(Qi) = <b 2 + 2(7 a 2 c(il) + 4of and E(Q 2 ) = o; + 2 o 2 ac(b) + 4 of + 8 of 

respectively. Let cr ( j = o 2 + 2o 2 ac(h) + 4 of, o\ - of and a = 8. Then the limits of the confidence 
interval for o\ are 


(1 x 39.0625/3.8415)- (1 xl0.5625/0.00393) 
8 


-334.69 


and 


d = 


(1 x 39.0625/0.00393) - (lx 10.5625/3.8415) 


1242.10 


8 
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or 


-334.69 < o~ a < 1242.10 

In this example, the two lines in Figure 20.1 that determine the value of c intersect outside 
of the parameter space, thus the lower limit is truncated to zero so the confidence interval 
only includes values of o 2 a in the parameter space. Consequently, the resulting confidence 
interval for o 2 a is 0 < aj < 1242.10. A similar technique can be used to construct confidence 
intervals about a 2 b , o 2 ac(h) , and o 2 ih) , but it cannot be used to obtain a confidence interval 
about a\. 

Burdick and Graybill (1992) present several situations where exact confidence intervals 
can be obtained for the variance components. They also present approximate confidence 
intervals for several functions of the variance components. Exact confidence intervals can 
usually be obtained for balanced designs with equal numbers of observations per cell and 
no missing cells. The intervals are based on having a set of sums of squares in an analysis 
that are distributed as independent chi-square random variables. Burdick and Graybill 
(1992) present confidence intervals developed by Graybill and Wang (1980), Lu et al. (1989) 
and Ting et al. (1990) for various functions of the variance components. The following dis¬ 
cussion is about confidence intervals associated with the balanced one-way random effects 
treatment structure carried out in a completely randomized design structure, which 
Burdick and Graybill refer to as a one-fold nested design. The model is 

J/y = 11+ + % i = 1,2 ,...,f, j = 1,2 

M f ~ i-i-d. N( 0, <7 2 ), e tj ~ U.d. N( 0, a]) 

and the and the £, ( are independent random variables. 

The analysis of variance table for this model is based on two sums of squares 


SSBetween = n£(y„ - y..) 2 = (f-l)Qi 
2=1 


and 


SSWithin = - y,.) 2 


i= i H 


t(n-l)Q 2 


where Q, and Q 2 are the respective mean squares. The expected mean squares correspond¬ 
ing to the sums of squares are 

E(MSBetween) = E(Q,) = a\ + na 2 u and E(MSWITHIN) = E(Q 2 ) = of respectively 
The two sums of squares are independent random variables with sampling distributions 


(f-m 

ct ; + rial 


xL 


and 


t(n-l)Q 2 

2 


%t(n- 1) 
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The exact (1 - a)100% confidence interval about at is 


q 2 


< at < 


q 2 


1 a/2,f(«-l),o« 


1 l-(a/2),t(n-l),oc 


where F a / 2 ,t(n-i),a= and Fi-(a/2),t(n-i),™ are the upper and lower critical points from an F-distribution 
with t(n -1) degrees of freedom for the numerator and with oo degrees of freedom for the 
denominator. It can be noted that F a/2 (( „_ 1)oo = xZ, 2 ,t(n-i)/W ~ 1)] and F x _ (a/2W(M _ x)oo = 

[t(n-Vj\. Using these representations for the percentage points, the above confidence inter¬ 
val about a\ is identical to the interval in Section 20.2.1. 

An approximate (1 - a)100% confidence interval about a't given in Burdick and Graybill 
(1992) is 



q 2 -V^ < ^ < Q t ~ Q 2 + 

n " n 


where 


where 


and 


V L = G X Q X + H;Qt + G I2 Q,Q 2 V u = H& + G 2 Q 2 + H 12 Q 2 Q. 


G, =1- 


«i = 


L a/2,f-l,oc 


G, = 1 - 


1 


F 


-1 , H,= 


r a/2,t(n-l),« 

1 

F 


-1 


Gj 2 - 


(^a/2,t-l,f(«-l) 1) Gj F a /2,t-l,t(n-l) H2 


1 a/2,f-l,f(n-l) 


h 12 = 


(1 - F, 


<a/2),f-l,t(n-l) 


) 2 - Ft , 2 Fl 


— c 2 

(a/2),t-l,t(n-l) ^2 


r l-(a/ 2 ),f-l,f(n-l) 


When the lower bound is negative, the lower bound is set to 0. The lower bound is negative 
when Q x /Q 2 < F a/2t _ im _ iy Using Q x /Q 2 > F at _ u(n _ V) . as a decision rule to reject H 0 : a 2 = 0 
vs H a : O ' 2 > 0 provides an exact test that is a uniformly most powerful unbiased test 
(Lehmann, 1986). 

An approximate (1 - a)100% confidence interval about at + at given in Burdick and 
Graybill (1992) is 


at + 


at- 


yjclQl + G 2 («-1)Q. 


^ _2 . 2 ^ C .2 . 212 

< G c + <7„ < <T C + (J, ( 


V^iQi + W 2 2 (n-1)Q : 
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where 


1 1 

G, = 1-, G, = 1- 

i p ' 2 p 

^a/2,f-l,~ r a/2,t(n-l),~ 

1 1 

H, =--1, H, =--1 

1 p 2 r 

M-(a/2),f-l,~ M-(a/2),t(n-l),~ 

An approximate (1 - a)100% confidence interval about the intraclass correlation, 

L* -1 U* -1 

-< p <- 

L*-l + n li*-l + n 

where 

L* =-^- and U* =-^- 

Q2^a/2,t-l,f(n-l) Q2-^l-(a/2),f-l,f(n-l) 

A (1 - a)100% confidence interval about the ratio c 2 /c\ is given by 

M (7^ « 

Burdick and Graybill (1992) also describe methods for constructing confidence intervals 
about variance components and functions of variance components for both balanced 
and unbalanced designs when there are two or more variance components in the model. 
The discussion in this section has been restricted to models with two variance com¬ 
ponents. The following example is used to demonstrate the computations of these 
confidence intervals. 


P = 


_2 . 2 
o, + o,, 


is 


20.2.8 Example 20.6 Continued 

The data of Example 20.5 are used to demonstrate the computation of the confidence inter¬ 
vals for a\, of a 2 Jal, and p. The two sums of squares are Q, = 32.0, Q 2 = 1.6, with n = 3, 
f — 1 = 4, t(n — 1) = 10, Fo. 025 , 3,10 = 4.46834, Fo. 975 , 3 , 10 = 0.11307, ^ 0 . 025 , 3 , 0 ° = 2.78582, ^ 0 . 025 , 10 ,°° = 2.04832, 
F 09753 „ = 0.12110, F 097510 „ = 0.32470, G x = 0.64104, G 2 = 0.51179, H t = 7.25732, H, = 2.07979, G 12 = 
-0.11208, H 12 = -8.15882,' V L =426.129,1^ = 53515.71, I* = 4.47593, and U* = 176.878. The 95% confi¬ 
dence intervals for a\, a 2 , a\Ja 2 , and p are [0.78113 < a] < 4.92767], [3.25237 < a\, < 
87.2449], [4.78382 < a] + a 2 < 89.1765], [1.15864 < (a 2 /<r 2 ) < 58.6259], and [0.53675 < p < 0.98323], 
respectively. 

The Satterthwaite type confidence intervals from the REML solution are displayed in 
Table 20.14. Three methods have been used to construct a confidence interval about c 2 . 
The three 95% confidence intervals are Burdick and Graybill (3.25237 < c 2 < 87.2449), 
REML (3.4964 < <7 2 < 99.2870), and min-max over the joint confidence region (2.186 < cr 2 < 
87.818). There is not much difference among the three intervals and the differences 
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TABLE 20.14 

SAS-Mixed Code and REML Estimates of the Variance Components for the Worker 
Data of Example 20.5 

PROC MIXED data=EX_20_5 METHOD=REML COVTEST CL; 

CLASS worker; 

MODEL num_units=; 

RANDOM worker; 


Covariance Parameter Estimates 







Covariance Parameter 

Estimate 

Standard Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Worker 

10.1333 

7.5462 

1.34 

0.0897 

0.05 

3.4964 

99.2870 

Residual 

1.6000 

0.7155 

2.24 

0.0127 

0.05 

0.7811 

4.9277 


would decrease as the number of levels of the random effects corresponding to the 
variance components increases. 


20.3 Simulation Study 

Since there are so many options for computing confidence intervals, one can carry out a 
simulation study using the structure of one's given data to provide information as to how the 
confidence interval procedures are performing. For example, the information in Table 20.15 
consists of SAS data step code to carry out a simulation for the structure of the data set in 
Example 20.5, where there are five workers with three observations each. The values of the 
variance components used in the simulation correspond to the REML estimates from 
Example 20.5 in Table 20.14 (recall that the method of moments estimates are identical to 
the REML estimates in this case). The simulations showed that 96.63% of the confidence 
intervals included the true value of o 2 w and 95.26% of the confidence intervals include the 
true value of of Both of these empirical confidence rates are very close to the 95% coverage 
that one expects to have. The coverage of each of these Satterthwaite confidence intervals 
is quite adequate for this size of design when the data are normally distributed. 

The sums of squares from the analysis of variance table for an unbalanced model do not 
necessarily have independent chi-square distributions. Thus, the techniques used for 
balanced models cannot be applied directly to unbalanced models without violating the 
assumptions. If the model is not too unbalanced, the sums of squares will nearly be 
independent chi-square random variables, and the balanced-model techniques should 
provide effective confidence intervals. For unbalanced models and large sample sizes, the 
maximum likelihood estimates and their asymptotic properties can be used to construct 
confidence intervals about the variance components. The above examples illustrate 
how different confidence intervals can be for any given problem. When the design is 
unbalanced and the sample size is not large, one suggestion has been to use the widest 
confidence interval. Another technique to evaluate the effectiveness of a particular method 
is to use simulation. 
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TABLE 20.15 


Simulation Code and Results for One-Way Random Effects Check on Coverage 
of Satterthwaite Confidence Intervals 




Frequency 

Percentage 

sig2w_check 

0 

337 

3.37 


1 

9663 

96.63 

sig2e_check 

0 

474 

4.74 


1 

9526 

95.26 


dm 'log; clear; output; clear;'; 

options nodate center linesize=75 pagesize=55 PAGENO=l; 

ods rtf file="c:\amd_i\chapter 20\Simulation of Cl REML.rtf" bodytitle; 
TITLE "Simulation of data set like EXAMPLE 20.5-1 WAY AOV - Balanced 
Random Effects"; 

%let sig2e=1.6; %let sig2w=10.133;**set the two variances to values; 
ods listing exclude all;**do not print following analyses; 
ods rtf exclude all; 
ods noresults; 

data gen;seed=5494812;*select a seed for the random number generator; 
sige=sqrt(&sig2e);**calculate the standard deviations; 
sigw=sqrt(&sig2w); 

do nsim=l to 10000;**generate 10,000 data sets; 

do work=l to 5;**five workers in each study; 
w=normal(seed)*sigw;**random effect for workers; 
do j=l to 3;**3 observations per worker; 
num=7 + w + normal(seed)*sige;**generate obs with mean 7; 
output; 
end; 

end; 

end; 

proc mixed data=gen covtest cl;by nsim;**fit model for each data set; 
class work; model num=; random work; 

ods output covparms=cov;**output cov parameter estimates;run; 
ods listing select all;**print the following; 
ods rtf select all; 

data work; set cov; if covparm='work'; 

sig2w_check= (lower <=&sig2w <=upper) ; **check to see if confidence 

interval contains sig2w; 

proc freq data=work;table sig2w_check; 

data resid; set cov; if covparm='Residual'; 

sig2e_check= (lower <=&sig2e <=upper) ; **check to see if confidence 
interval contains sig2e; 
proc freq data=resid;table sig2e_check; 
run; 


20.4 Concluding Remarks 

This chapter describes methods to test hypotheses and construct confidence intervals about 
variance components and functions of the variance components. Testing hypotheses are 
accomplished using the sums of squares approach where the test statistics are constructed 
using the expected mean squares or by using likelihood ratio tests. Tests based on the 
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sums of squares method seem to have better coverage for small sample size experiments 
than those based on the likelihood ratio. Confidence intervals provided by SAS-Mixed are 
described as well as other confidence intervals for special situations. The usefulness of a 
particular confidence interval for a given data structure can be evaluated by using a simu¬ 
lation study as described in Section 20.3. Approximate methods evolving around the 
Satterthwaite approximation were described for unbalanced models and for some situations 
with balanced models. Several examples were included to demonstrate the techniques. 
The confidence intervals provided by SAS-Mixed seem to be adequate for the individual 
variance components, but specialized methods are needed for confidence intervals for 
nonlinear functions of the variance components such as the intraclass correlation coeffi¬ 
cient or the ratio of two variance components. 


20.5 Exercises 


20.1 For the coffee price data in Exercise 19.1, 

1) Provide tests of the following hypotheses 

H 0 : cr 2 ta te = 0 vs H a : a 2 tate > 0 and H 0 : (7 2 cily = 0 vs H a : cr 2 ity > 0 


2) Construct confidence intervals about 


20.2 Use the data in Exercise 19.2 and 

1) Provide tests of the hypotheses H 0 : cr 2 ow = 0 vs H a : (7 2 ow > 0, H 0 : <r 2 ol = 0 vs 
H- cr 2 0l > 0, and H 0 : cr? owxcol = 0 vs H a : c7? owxcol > 0 using type I, type II, and type 
III sums of squares. Discuss the differences observed. 

2) Provide 95% confidence intervals about each of the variance components in 
the model. 

20.3 For the following balanced one-way random effects data set from a completely 
randomized design structure, provide confidence intervals about c 2 , c 2 , ol + cr 2 , 
(7 2 /(T 2 , and p using the methods of Burdick and Graybill described in Section 
20.2. The data in the following table consist of the amount of drug recovered 
from five random samples of livestock feed from each of six randomly selected 
batches of feed. It is of interest to determine the sample to sample within batch 
variability caused by the mixing of the drug into the livestock feed and to 
determine the batch to batch variability. 


Sample Within 
Batch 

Batch _1 

Batch_2 

Batch_3 

Batch! 

Batch, 5 

Batch _6 

1 

5.75 

5.71 

5.93 

5.87 

6.05 

6.43 

2 

5.78 

5.69 

5.80 

5.75 

6.04 

6.57 

3 

5.94 

5.87 

5.73 

5.94 

5.72 

6.52 

4 

5.97 

5.62 

5.87 

5.95 

5.92 

6.45 

5 

5.76 

5.85 

5.72 

6.06 

5.88 

6.27 





Case Study: Analysis of a Random Effects Model 


The previous three chapters described methods for analyzing random effects models and 
provided some examples to demonstrate the various techniques of analysis. This chapter 
presents the analysis of a more complex experimental situation, which includes estima¬ 
tion, model building, hypothesis testing, and confidence interval estimation. 


21.1 Data Set 

In this experiment, the efficiency of workers in assembly lines at several plants was studied. 
Three plants were randomly selected from the population of plants owned by the corpo¬ 
ration. Four assembly sites and three workers were randomly selected from each plant. 
Each worker was expected to work five times at each assembly site in her plant, but because 
of scheduling problems and other priorities, the number of times that each worker actually 
worked varied from worker to worker. The order in which a worker was to be at each site 
was randomized and they adhered to that schedule as much as possible. The response 
variable was a measure of efficiency in assembling parts as a function of the number of units 
assembled and the number of errors made. The efficiency scores as well as the plant number, 
site number, and worker number are listed in Table 21.1 where EFF_1, EFF_2,..., EFF_5 
denote the scores for the five possible days that a worker could have worked. 

All three factors in the study are random where sites and workers are nested within a 
plant. Thus the site and worker effects as well as their interaction effect are nested within 
plant. The model used to describe the data is 

Vijki = P + P,+ s m + w m + {sw) m + e ijkl (21.1) 

for 

i = 1,2,3, / = 1,2,3,4, k = l,2,3, and l = 1,... ,n ijk . 
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where p, is the zth plant effect, s ;(/) is the /1 h site effect within plant i, w k(ij is the /cth worker 
effect within plant i, (sw) lkM) is the interaction effect between site and worker in plant i, and 
£ ijkl is the residual error term. 

It is assumed that 


Vi ~ H-d.N{ 0,a 2 p ) r s m ~ i.i.d. N(0 ,(?*), w m ~ i.i.d.N( 0,0 
sw m ~ i.i.d.N{Q,a 2 sw ),£ ija ~ i.i.d.N(0,a 2 ), 

and all the p„ szv lk(i) , and e, (/:/ are independent random variables. 


TABLE 21.1 


The Data for the Case Study Where EFF_z Denotes the zth Time a Worker Worked at a Site 


Plant 

Site/Plant 

Worker/Plant 

EFF_1 

EFF_2 

EFF_3 

EFF_4 

EFF_5 

1 

1 

1 

100.6 

106.8 

100.6 

— 

— 

1 

1 

2 

92.3 

92.0 

97.2 

93.9 

93.0 

1 

1 

3 

96.9 

96.1 

100.8 

— 

— 

1 

2 

1 

110.0 

105.8 

— 

— 

— 

1 

2 

2 

103.2 

100.5 

100.2 

97.7 

— 

1 

2 

3 

92.5 

85.9 

85.2 

89.4 

88.7 

1 

3 

1 

100.0 

102.5 

97.6 

98.7 

98.7 

1 

3 

2 

96.4 

— 

— 

— 

— 

1 

3 

3 

86.8 

— 

— 

— 

— 

1 

4 

1 

98.2 

99.5 

— 

— 

— 

1 

4 

2 

108.0 

108.9 

107.9 

— 

— 

1 

4 

3 

94.4 

93.0 

91.0 

— 

— 

2 

5 

4 

82.6 

— 

— 

— 

— 

2 

5 

5 

72.7 

— 

— 

— 

— 

2 

5 

6 

82.5 

82.1 

82.0 

— 

— 

2 

6 

4 

96.5 

100.1 

101.9 

97.9 

95.9 

2 

6 

5 

71.7 

72.1 

72.4 

71.4 

— 

2 

6 

6 

80.9 

84.0 

82.2 

83.4 

81.5 

2 

7 

4 

87.9 

93.5 

88.9 

92.8 

— 

2 

7 

5 

78.4 

80.4 

83.8 

77.7 

81.2 

2 

7 

6 

96.3 

92.4 

92.0 

95.8 

— 

2 

8 

4 

83.6 

82.7 

87.7 

88.0 

82.5 

2 

8 

5 

82.1 

79.9 

81.9 

82.6 

78.6 

2 

8 

6 

77.7 

78.6 

77.2 

78.8 

80.5 

3 

9 

7 

107.6 

108.8 

107.2 

104.2 

105.4 

3 

9 

8 

97.1 

94.2 

91.5 

99.2 

— 

3 

9 

9 

87.1 

— 

— 

— 

— 

3 

10 

7 

96.1 

98.5 

97.3 

93.5 

— 

3 

10 

8 

91.9 

— 

— 

— 

— 

3 

10 

9 

97.8 

95.9 

— 

— 

— 

3 

11 

7 

101.1 

— 

— 

— 

— 

3 

11 

8 

88.0 

91.4 

90.3 

91.5 

85.7 

3 

11 

9 

95.9 

89.7 

— 

— 

— 

3 

12 

7 

109.1 

— 

— 

— 

— 

3 

12 

8 

89.6 

86.0 

91.2 

87.4 

— 

3 

12 

9 

101.4 

100.1 

102.1 

98.4 




Case Study: Analysis of a Random Effects Model 


367 


21.2 Estimation 

The method of moments procedure, with type I—III sums of squares, REML, maximum 
likelihood, and MIVQUEO with and without the NOBOUND methods, is used to obtain 
solutions for the variance components and the results are displayed in Table 21.2. The 
SAS®-Mixed code used to obtain the REML estimates is also included where the other 
estimation techniques can be selected by incorporating the specific name with the 
Method = option. The expected mean squares of the type I and III sums of squares are 
given in Tables 21.3 and 21.4, which can be used to construct the equations required to 
obtain the method of moments estimators of each of the variance components. The REML, 
ML, and MIVQUEO estimates for of are all equal to zero while the MIVQUEO with the 
nobound option as well as the type I—III solutions for <7 2 are negative. The MIVQUEO 
solution is obtained by setting the solution for of equal to zero from the MIVQUEO solution 
without the nobound option. The estimates of the variance components are obtained by 
converting the negative solutions to 0; that is, the estimate of o 2 is 6 2 = 0. Since the estimate 


TABLE 21.2 

Solutions of the Variance Components for Model 21.1 Using Each of the Methods Available in 
Proc Mixed, Where MVQ Denotes MIVQUEO and MVQNB Denotes MIVQUE with the 
No-Bound Option. Proc Mixed Code Is for Method = REML 

proc mixed data=EX_21_l method=reml cl covtest; 
class plant worker site; 
model efficiency=; 

random plant worker(plant) site(plant) site*worker(plant) ; 


Covariance Parameter 

ML 

REML 

MVQ 

MVQNB 

Type I 

Type II 

Type III 

Plant of 

29.6011 

50.4898 

54.2064 

54.2064 

48.8063 

47.8649 

59.6007 

Worker(plant) of 

28.9381 

28.9451 

24.2302 

24.2302 

24.2549 

27.0455 

24.9422 

Site(plant) of 

0.0000 

0.0000 

0.0000 

-4.4155 

-4.8780 

-4.8780 

-4.0644 

Worker x site(plant)of„ 

28.7593 

28.7707 

29.8389 

29.8389 

35.6167 

35.6167 

35.6167 

Residual of 

4.9825 

4.9818 

6.6163 

6.6163 

4.9831 

4.9831 

4.9831 


TABLE 21.3 


Type I Analysis of Variance Table with Expected Mean Squares 
Type I Analysis of Variance 


Source 

df 

Mean Square 

Expected Mean Square 

Plant 

2 

2313.758960 

Var(Residual) + 3.9277 Vai[worker x site(plant)] + 
10.307 Var[sife(p/«Mf)] + 13.136 Var[iuorker(plant)] + 
38.941 Var(pZflHf) 

Worker(plant) 

6 

456.941927 

Var (Residual) + 3.9015 Var[worker x site(plant)] + 
0.6563 Var[sife(p/«Mf)] + 13.037 Var [workeriplant)] 

Site(plant) 

9 

84.049170 

Var(Residual) + 3.4789 Vai[worker x site(plant)] + 
9.1928 Var[s;fe(p/(7iif)] 

Worker x site(plant) 

18 

106.738256 

Vai(Residnal) + 2.8569 \av[worker x site(plant)\ 

Residual 

82 

4.983134 

Var(Residual) 
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TABLE 21.4 

Type III Analysis of Variance Table with Expected Mean Squares 
Type III Analysis of Variance 


Source 

df 

Mean Square 

Expected Mean Square 

Plant 

2 

1933.164549 

'Var(Residual) + 2.2998 \ar[worker x site(plnnt)\ + 
6.8995 Var[s/fe(p/a)!f)] + 9.1993 Var [workerlplant)] + 
27.598 Var(plant) 

Worker(plant) 

6 

324.942651 

VarfResidual) + 2.3633 Var [worker x site(plant)] + 
9.4533 Var[worker(plant)] 

Site(plant) 

9 

67.811366 

Var(Residual) + 2.6823 Var [worker x site(plant)\ + 
8.0468 Var[s/fe(p/flizf)] 

Worker x site(plant) 

18 

106.738256 

Var(Residual) + 2.8569 Var [worker x site(plnnt)\ 

Residual 

82 

4.983134 

Var(Residual) 


of of is zero, the next step would be to test hypotheses about the variance components in 
the model. A stepwise deletion process can be used to remove random components from 
the model in an attempt to obtain a simpler model that describes the variance in the 
process. Deleting a term from the model essentially sets the removed variance component 
equal to zero. Such model building can occur for any variance components corresponding to factors 
in the treatment structure, but there will be no model building for variance components corre¬ 
sponding to factors in the design structure. 


21.3 Model Building 

The analysis of variance table, based on type I sums of squares, is used to build a model by 
testing hypotheses about the variance components. The process starts by investigating the 
variance component associated with the last sum of squares and then working up the line 
to the variance component associated with the first sum of squares. The null hypothesis is 
that the variance component is equal to zero. The observed and expected mean squares for 
the type I sums of squares method are listed in Table 21.3. 

The first step is to test H 0 : a\ w = 0 vs H a : of > 0. On inspecting the expected mean 
squares in Table 21.3, the Residual is determined to be the appropriate divisor, thus, the 
test statistic is 


= MStumbr x site (ptojM = 2IA2 (which ls in Table 215) 
c ” MS(Residual) v ’ 

The sampling distribution of F c is compared with the critical point from an F-distribution 
with 18 degrees of freedom for the numerator and 82 degrees of freedom for the denomi¬ 
nator. The observed significance level is less than 0.0001, which indicates that o 2 sm is an 
important source of variability in the process. 

The second step is to test H 0 : aj = 0 vs H a : crj > 0. The E{MS[site(plant)]} equals 
a\ + 3.4789 chi,, + 9.1928 of, and no other mean square exists with expectation a\ + 3A789cr; w 
that could be used as a divisor for the test statistic. Thus, a mean square Q* needs to be 
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constructed from MS(Residual) and MS[ivorker x site(plant)] such that E(Q*) = a 2 e + 3.4789 a f ro . 
Such a Q* is 


Q* = 3.4789 


MS[worker x site(plant)\ 

_|_ 

' 3.4989' 

2.8569 

T 

2.8569_ 


MSfResidual) 


= 1.2177MS[icorfer x site(plant)] - 0.2177MS(Residual) 
= 128.921 


A chi-square distribution can be used to approximate the sampling distribution of 
r s Qt/(o 2 e + 3.4789d^) by using the Satterthwaite approximation to determine the associ¬ 
ated degrees of freedom as 


r. = 


_ (q:y _ 

{1.21 77MS[worker x s ite(plant)]} 2 [0.2177 MS(Residual)] 


18 

(128.921) 2 


82 


[1.2177 x 106.738] [0.2177 x 4.983] 2 


= 17.7 


18 


82 


The test statistic is 


= MS[s,fe(pto>'OI = 0 65 (seeTable215) 

Q* 

which is approximately distributed as an F-distribution with degrees of freedom 9 and 
17.7. The observed significance level of the test is 0.7396, which indicates that erf is a negli¬ 
gible source of variation in the process; that is, one fails to reject H 0 : of = 0 vs H,,: of > 0. 
Tables 21.4 and 21.6 contain the mean squares, expected mean squares, appropriate error 
terms, approximate denominator degrees of freedom, and test statistics based on the 
type III sums of squares. The approximate F-statistic to test H 0 : of = 0 vs H a : of > 0 has 
a value of 0.67 with estimated denominator degrees of freedom 18.1 and an observed 


TABLE 21.5 


Type I Tests of Hypotheses about the Variance Components 


Type I Analysis of Variance 





Source 

df 

Error Term 

Error df 

F-Value 

Pr>F 

Plant 

2 

1.0076 MS\worker(plant)\ + 1.0493 MS[site(pIant)] - 
1.279 MS[worker x site(plant)] + 0.2221 MSfResidual) 

4.5859 

5.60 

0.0588 

Worker(plant) 

6 

0.0714 MS[site(plant)] + 1.2787 MS[worker x sitefplant )] - 
0.3501 MSfResidual) 

19.066 

3.25 

0.0227 

Site(plant) 

9 

1.2177 MS[worker x sitefplant)] - 0.2177 MS(Residual) 

17.701 

0.65 

0.7396 

Worker x site(plant) 

18 

MS(Residual) 

82 

21.42 

<0.0001 

Residual 

82 

— 

— 

— 

— 
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TABLE 21.6 


Type III Tests of Hypotheses about the Variance Components 


Type III Analysis of Variance 




Source 

df 

Error Term 

Error df 

F-Value 

Pr>F 

Plant 

2 

0.9731 MS[worker(plant)\ + 0.8574 MS[site(plant)] - 
0.805 MS[zvorker x site(plant)] - 0.0256 MS(Residual) 

4.7631 

6.71 

0.0412 

Worker(plant) 

6 

0.8272 MS[zvorker x site(plant)] + 0.1728 MS(Residual) 

18.352 

3.64 

0.0148 

Site(plant) 

9 

0.9389 MS[worker x site(plant)] + 0.0611 MS(Residual) 

18.11 

0.67 

0.7217 

Worker x site(plant) 

18 

MS(Residual) 

82 

21.42 

<0.0001 

Residual 

82 

— 

— 

— 

— 


significance level of 0.7217. The conclusion is the same for of from the type III analysis, as 
was obtained from the type I analysis. Since site(plant) is part of the treatment structure 
and of is negligible, one strategy is to set of equal to zero, eliminating s ;(() from the model 
and fitting a reduced model to the data. 


21.4 Reduced Model 

The reduced model is 


}J ijkl = P + Pi + w k (i) + (sw) m + £ ijkl (21.2) 

i = 1,2,3, /' = 1,2,3,4, k = 1,2,3, l = \,...,n ijk 

When is eliminated from the model, SAS-Mixed pools the sum of squares due to site(plant) 
with the sum of squares zvorker x site(plant) as indicated by the 27 degrees of freedom in 
Table 21.7 for zvorker x site(plant). This is a reasonable process since the expected mean 
squares of MS[site(plant)] and MS[zvorker x site(plant)] from Tables 21.3 and 21.4 are similar 
when cr s 2 = 0; that is, E[MS[site(plant)]} = a 2 e + 3 . 4789 ( 7 ^ and E{MS[zvorker x site(plant)]} = 
o 2 e + 2.8569crin Table 21.3, and the coefficients are even closer in the type III analysis. 

The degrees of freedom, type I mean squares, and expected mean squares for the reduced 
model, are listed in Table 21.7. The expected mean square for zvorker x site(plant) is cl + 
3.0643er^. The coefficient 3.0643 is computed as 3.0643 = [(9 x 3.4789) + (18 x 2.8561)/27] 
which is the pooled coefficients of of,„ from the expected mean squares of site(plant) and 
zvorker x site(plant ) from the type I analysis. This equivalence occurs since the type I sums 
of squares are sequential and thus, the pooling process is additive. This type of phenom¬ 
enon does not occur with other types of sums of squares, as can be observed by comparing 
the type III analyses in Tables 21.4,21.6, and 21.8. The estimates of the variance components 
for the reduced model using REML, ML MIVQUE0, type I—III methods are displayed in 
Table 21.9. The estimates of the residual variance are approximately equal to 4.98 for all 
methods except for MIVIQUE0, which yields the largest value of 6.61. The ML estimate of 
(7^ is the smallest at 29.6 and the type III estimate is the largest at 58.6. The other methods 
range from 47.6 to 53.1. Since there are only three levels of plant, a 2 is the hardest variance 
component to estimate; that is, less is known about a 2 than the other variance components 
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TABLE 21.7 


Type I Analysis of Variance Table with Expected Mean Squares for the Reduced Model 


Type I Analysis of Variance 



Source 

df 

Mean Square 

Expected Mean Square 

Plant 

2 

2313.758960 

Var(Residual) + 3.9277 Var [worker x site(plant)] + 
13.136 Var[ worker(plant)] + 38.941 Var(plant) 

Worker(plant) 

6 

456.941927 

Var(Residual) + 3.9015 Var [worker x site(planl)] + 
13.037 Var [ivorker(plant)] 

Worker x site(plant) 

27 

99.175228 

Var(Residual) + 3.0643 Var [worker x site(plant)] 

Residual 

82 

4.983134 

Var(Residual) 


TABLE 21.8 

Type III Analysis of Variance Table with Expected Mean Squares for the Reduced Model 
Type III Analysis of Variance 


Source 

df 

Mean Square 

Expected Mean Square 

Plant 

2 

1933.164549 

Var (Residual) + 2.2998 Varfworker x s ite(plant)] + 
9.1993 Vai[worker(plant)] + 27.598 Var(plant) 

Worker(plant) 

6 

324.942651 

Var (Residual) + 2.3633 Var (worker x site(plant)\ + 
9.4533 Var[worker(planf)] 

Worker x site(plant) 

27 

99.175228 

Var (Residual) + 3.0643 Var [zuorker x site(plant)] 

Residual 

82 

4.983134 

Var(Residual) 


TABLE 21.9 


Estimates of the Variance Components for the Reduced Model 


Covariance Parameters 

ML 

REML 

MQV 

Type I 

Type II 

Type III 

Plant <Tp 

29.6002 

50.4898 

53.1030 

47.5975 

47.5975 

58.5846 

Worker(plant) 

28.9415 

28.9453 

25.3453 

25.4692 

25.4692 

26.1617 

Worker x site(plant) 

28.7582 

28.7706 

25.4195 

30.7388 

30.7388 

30.7388 

Residual <r^ 

4.9825 

4.9818 

6.6119 

4.9831 

4.9831 

4.9831 


in the model. Estimates of a 2 and a\ B are quite consistent across the different methods of 
estimation with deranging from 25.5 to 28.9 and & 2 W ranging from 25.4 to 30.7. 

The analysis can continue by carrying out tests of hypotheses about the remaining vari¬ 
ance components to determine whether the model can be simplified further. The statistic 
to test H 0 : a 2 sw = 0 vs H a : a 2 sw > 0 is 

MS[worker x site(plant)\ 

f> =-= 19.9U 

MSfResidual) 

which is distributed as F-distribution with 27 and 82 degrees of freedom. The observed 
significance level is less than 0.0001. Thus, o\ w is an important source of variance in the 
process of generating the data and (siu)j k(f) should remain in the model. Again, a 2 w is the 
adaptation variance component where some workers work more effectively at some sites. 
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while other workers work more effectively at different sites. Tailoring each site to its 
workers' needs can help reduce the variability in the system. 

A Q* needs to be computed to test H 0 : = 0 vs H a : a\, > 0. The mean square Q* needs 

to be constructed from MS[zvorker x site(planf)] and MS(Residual) such that E(Q*) = 
cj 2 +3.9015(72,,. Then 


Q: = 3.9015 


MS[site x worker (plant)] 

_|_ 

' 3.9015' 

3.0643 

T 

3.0643_ 


MSfResidual ) 


= 1.2732MS[site x zvorker(plant)] - 0.2732MS(Residual) 
= 124.9085 


The test statistic is F Cw = 456.9419/124.9085 = 3.66, which is distributed as an F-distribution 
with 6 and r w degrees of freedom where 


r = _Ml_ 

{1.2732MS[site x worker(plant)]} 2 /27 + [0.2732 x MSfResidual)] 2 / 82 
(124.9085) 2 

~~ [1.2732 x 99.1752] 2 /27 + [0.2732 x 4.9831] 2 /82 
= 26.42 


The observed significance level for this test statistic is 0.0089, indicating that of,, is an 
important part of the variation in the system. 

Finally, another Q* needs to be constructed from MS[zvorker(plant)\, MS[worker x 
site(plant)\, and MSfResidual) such that 


E(Q* p ) = a; + 3.9277ct 2 „ + 13.136a 2 , 

so that a statistic can be constructed to test H 0 : a J - = 0 vs H a : a;, > 0. The required Q* is 


Q 


* 

V 


13.136 

13.037 


{ MS[ivorker (plant )]} 


+ 


3.9277- 13,136 (3.0915) 
_13.037 


3.0643 


{ MS[site x worker (plant)]} 


13.136 

13.037 


3.9277 


13.136 

13.037 

3.0643 


(3.0915) 


MSfResidual) 


= 1.0076MS[worker(plant)] - 0.0012MS[site x worker(plant )] - 0.006 5MS(Residual) 
= 460.2638 



Case Study: Analysis of a Random Effects Model 


373 


The test statistic is 


= MSj plant) = 5 02? 

' q; 

which is distributed as an F-distribution with 2 and r p degrees of freedom where 
r v - (Qp) 2 /D p and 

{1.0076MS[iforter(p/fl«f)]} 2 ^ |0.0012MS[sife x ivorker(plant)]} 2 
v ~ 6 + 27 

+ [0.0065M S (R, s ,V i „ nl J 
82 

Then r p = 5.9961. The observed significance level associated with this test is 0.0522, which 
indicates that o 2 is an important contributor to the variation in the system, but it is not as 
important as either cr 2 w or of All of the variance components in model (21.2) are signi¬ 
ficantly different from zero at a < 0.10 and the model cannot be simplified further. 

The results in Tables 21.8 and 21.10 are from the type III sums of squares. SAS-Mixed 
uses the expected mean squares to compute the appropriate divisors and the Satterthwaite 
approximation to compute the denominator degrees of freedom. The significance levels 
from the type III sums of squares analysis for testing H 0 : a 2 = 0 vs H a : a 2 > 0, H 0 : <j 2 - 0 vs 
H a : of > 0, and H 0 : of, = 0 vs H a : o 2 sm > 0 are 0.0357,0.0040, and less than 0.0001, respectively. 
The significance levels from the type III analysis are a little smaller than those from the 
type I analysis (for the first two tests), but in other problems the significance levels of the 
type III analysis can be a little greater than those from the type I analysis. Table 21.11 con¬ 
tains the results from the REML analysis of the reduced model. The significance levels 
associated with the Z-value test statistics are 0.2118, 0.0869, and 0.0003 for the plant, 
zvorker(plant) and worker x site(plant) variance components, respectively. These significance 
levels are considerably larger than those from the type I and III sums of squares methods. 
This occurs because the Z-test is asymptotically distributed as a normal random variable, 
and in these cases, the numbers of degrees of freedom associated with each of the variance 
component estimates are small. Unless there are a lot of levels associated with an estimate 
of a variance component, the information associated with the Z-value is not useful for 
testing hypotheses. 

TABLE 21.10 


Type III Tests of Hypotheses about the Variance Components for the Reduced Model 


Type III Analysis of Variance 

Source df 

Error Term 

Error df 

F-Value 

Pr>F 

Plant 

2 

0.9731 MS[ivorker(plant)] + 0.0269 MS(Residual) 

6.0051 

6.11 

0.0357 

Worker(plant) 

6 

0.7713 MS[zvorker x site(plant)] + 

0.2287 MS(Residual) 

27.809 

4.19 

0.0040 

Worker x site(plant) 

27 

MS(Residual) 

82 

19.90 

<0.0001 

Residual 

82 

— 

— 

— 

— 
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TABLE 21.11 

REML Estimates of the Variance Components for the Reduced Model. Satterthwaite-Type 
Confidence Intervals for the Variance Components 

proc mixed data=EX_21_l METHOD=REML cl covtest; 
class plant worker site; 
model efficiency^; 

random plant worker(plant) site*worker(plant); 

Covariance Parameter Estimates 

Covariance Standard 


Parameter 

Estimate 

Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Plant 

50.4898 

63.0887 

0.80 

0.2118 

0.05 

11.2442 

12100 

Worker(plant) 

28.9453 

21.2802 

1.36 

0.0869 

0.05 

10.0867 

271.39 

Worker x site(plant) 

28.7706 

8.3753 

3.44 

0.0003 

0.05 

17.4775 

56.0425 

Residual 

4.9818 

0.7776 

6.41 

<0.0001 

0.05 

3.7506 

6.9406 


The plant variance component measures the differences among plants within the popu¬ 
lation of plants. Since there is a large amount of variability, one of the plants most likely 
has procedures in place that enable the workers to work more efficiently. The importance 
of the zvorker(plant) variance component indicates that some workers are more efficient 
than others, so a training program could help improve the efficiency of workers not 
performing as well as others. The worker x site(plant) variance component is an adaptation 
variance component that means that some workers are more adapted to perform at some 
sites and not as well at other sites where, on the other hand, some other workers do well at 
the sites the previous workers did not. 


21.5 Confidence Intervals 

The next step in the analysis is to construct confidence intervals about the variance 
components of the reduced model in Equation 21.1. The results displayed in Tables 21.11 
and 21.12 are the estimates of the variance components for the reduced model using REML 
and type I sums of squares, respectively. The confidence intervals computed for the REML 
solution are computed using the method in Section 20.2.2 with df= 2 (Z-value) 2 . The degrees 
of freedom are 1.28 for 6f 3.70 for ay, 23.60 for a 2 sm and 82.10 for 61. Table 21.13 contains the 
computed degrees of freedom for the REML confidence intervals as well as intervals that are 
expressed in standard deviation units. The confidence intervals computed for the type I 
analysis are the Wald intervals of Section 20.3.4 (as noted by the fact that the intervals are 
symmetric about the estimate and some of the lower limits are negative), except for the a\, 
where the interval is computed using the results in Section 20.2.2 based on 81.93 degrees of 
freedom. Confidence intervals based on the type I sums of squares can be recomputed 
using the method of Section 20.2.2 with df= 2(Z-value) 2 . For <jj, the resulting number of 
degrees of freedom is 1.4677, providing a 95% confidence interval as [11.2668 < <7;. < 5984.61]. 
For the resulting number of degrees of freedom is 3.9682, providing a 95% confidence 
interval as [9.1149 < < 212.97]. For 6; w , the resulting number of degrees of freedom is 
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20.8392, providing a 95% confidence interval as [18.1625 < o 2 sw < 62.98]. The results of these 
recomputed confidence intervals are summarized in Table 21.14. 

Next the usual Satterthwaite approximation is used to provide confidence intervals 
about the variance components using the information from the type I sums of squares. 
The method in Section 20.2.1 is used construct a confidence interval about aj. A 95% 
confidence interval about aj is 


82[MS(Residual)] ^ ^ 

y2 -°e ~ 

X(a/2),82 


82 [MS(Residual)] 


X i- 


(a/2),82 


TABLE 21.12 

Type I Sums of Squares Estimates of Variance Components for the Reduced Model. Wald-Type 
Confidence Intervals for Variances Other than Residual 

Covariance Parameter Estimates 

Covariance Standard 


Parameter 

Estimate 

Error 

Z-Value 

Pr Z 

a 

Lower 

Upper 

Plant 

47.5975 

55.5617 

0.86 

0.3916 

0.05 

-61.3015 

156.50 

Worker(plant) 

25.4692 

18.0815 

1.41 

0.1590 

0.05 

-9.9699 

60.9083 

Worker x site(plant) 

30.7388 

9.5227 

3.23 

0.0012 

0.05 

12.0746 

49.4030 

Residual 

4.9831 

0.7786 


<0.0001 

0.05 

3.7509 

6.9440 


TABLE 21.13 

REML Estimates of the Variance Components with the Degrees of Freedom Used in the 
Computations. The Standard Deviations and Square Roots of the Intervals Are Included 

Covariance 


Parameter 

Estimate 

Lower 

Upper 

df 

SD 

Low SD 

High SD 

Plant 

50.4898 

11.2442 

12100 

1.2810 

7.10562 

3.35323 

109.998 

Worker(plant) 

28.9453 

10.0867 

271.39 

3.7003 

5.38009 

3.17596 

16.474 

Worker x site(plant) 

28.7706 

17.4775 

56.0425 

23.6009 

5.36382 

4.18061 

7.486 

Residual 

4.9818 

3.7506 

6.9406 

82.1009 

2.23200 

1.93664 

2.635 


TABLE 21.14 


Recomputed Confidence Intervals Using General Satterthwaite Approximation in Section 20.2.2 
Based on Type I Sums of Squares 


Covariance 

Parameter 

Estimate 

df 

New Low 

New Up 

SD 

Low SD 

High SD 

Plant 

47.5975 

1.4677 

11.2668 

5984.61 

6.89909 

3.35661 

77.3602 

Worker(plant) 

25.4692 

3.9682 

9.1149 

212.97 

5.04670 

3.01909 

14.5936 

Worker x site(plant) 

30.7388 

20.8392 

18.1625 

62.98 

5.54426 

4.26175 

7.9360 

Residual 

4.9831 

81.9300 

3.7505 

6.95 

2.23229 

1.93662 

2.6353 
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or 

82(4.9831) „ 2 ^ 82(4.9831) 
108.937 ~° e ~ 58.8446 
or 


3.781 < o\ < 6.944 

The above 95% confidence interval about a\ is identical to that in Table 21.11 for the 
Residual. 

For the Satterthwaite confidence interval about using the expected mean squares from 
Table 21.7, the method of moments solution (and estimate) of a\ w can be expressed as 


MS[ivorker x site(plant)] 
3.064 


3.064 


Then r sw 6f,/<J 2 sw is approximately distributed as a chi-square random variable with r sw 
degrees of freedom where 


r = 




MS[ivorker x site(plant)] 


3.064 


/27 + 


MSfResidual) 


3.064 


■ = 24.33 


/82 


An approximate 95% confidence interval about a 2 m is 


r <7„ 


< < 


x 


.025,24.33 


r o~ 

sw si 

X .975,24.3: 


or 


18.80 < a 2 sw < 59.17 

To construct the confidence interval about cr|, express the solution for o 1 -, as (obtain infor¬ 
mation from the Error term of Table 21.15) 

<5^ = \MS(worker) - 1.2732MS[ivorker x site(plant)] + 0.2732MS(Residual)} /13.037 
= 25.4692 


The approximate degrees of freedom associated with 61, is r w = 3.12, which leads to an 
approximate 95% confidence interval as [8.297 < < 327.87]. 

To obtain a 95% confidence interval about crj] one can use the results in Table 21.15 to 
express the estimate for as 

<7p = {MS(plant) - 1.0076MS[ivorker(plant)] + 0.0012 MS[ivorker x site(plant)] 

+ 0.0065MS(Residual)}/38Ml 
= 47.60 
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TABLE 21.15 


Type I Tests of Hypotheses about the Variance Components for the Reduced Model 


Type I Analysis of Variance 





Source 

df 

Error Term 

Error df 

F-Value 

Pr>F 

Plant 

2 

1.0076 MS[worker(plant)] - 0.0012 

MS[worker x site(plant)] - 0.0065 MS(Residual) 

5.9961 

5.03 

0.0522 

Worker(plant) 

6 

1.2732 MS[worker x site(plant)\ - 
0.2732 MS(ResiduaI) 

26.42 

3.66 

0.0089 

Worker x site(plant) 

27 

MS(Residual) 

82 

19.90 

<0.0001 

Residual 

82 

— 

— 

— 

— 


The number of degrees of freedom associated with af is r p = 1.27. The resulting approxi¬ 
mate 95% confidence interval is [10.58 < < 12078.74]. The set of confidence intervals 

computed using the Satterthwaite approximation is displayed in Table 21.16. 

The final stage in the analysis is to use the confidence region method in Section 20.2 
to construct confidence intervals about the variance components, of of, and of,. A 95% 
confidence interval about as of, has endpoints c sw and d sw where 


c 


SID 


27MS[zvorker x site(plant)\/x 2 075 , 27 - 82 MS(Residual)/xl 75 , S2 

3.064 


17.96 


and 


2 7MS[worker x site(plant)\/x 975 27 - 82 MSfResidual) / xf^ i2 

_ 58 74 

sa 3.064 

Simplifying these two expression one gets the confidence interval 17.96 < of, < 58.74. 
The two endpoints of a 95% confidence interval about of are c,„ and d w where 


c,„ = 


6MS[ivorker(plant)]/ x 2 02S ,e ~ lx. 


* /„2 

975 , 26.42 


13.037 


= -3.335 


TABLE 21.16 


Confidence Intervals Based on the Usual Satterthwaite Approximation Using 
Type I Sums of Squares 


Source 

Estimate 

RDF 

Low s2 

High s2 

SD 

Low SD 

High SD 

Plant 

47.5975 

1.2667 

10.5468 

12078.74 

6.89910 

3.24759 

109.903 

Worker(plant) 

25.4685 

3.1152 

8.2972 

327.87 

5.04664 

2.88048 

18.107 

Worker x site(plant) 

30.7385 

24.3347 

18.7971 

59.17 

5.54423 

4.33557 

7.692 

Residual 

4.9831 

82.0000 

3.7509 

6.94 

2.23229 

1.93673 

2.635 




378 


Analysis of Messy Data Volume 1: Designed Experiments 


and 


6 MS[worker(plant)]/ x 2 975i6 - C,Q* / 2f,025,26.42 
13.037 


164.00 


where r w = 26.42 and Q* = 124.9085. Simplifying these expressions gives the interval, 
0 < cr 2 < 164.00. 

The endpoints for a 95% confidence interval about cr 2 are c p and d p where 

2 MS(plant)]/ % 025 2 — r Q /% 975 5 9961 

c„ = - c — -= -41.2471 

p 38.941 


and 


2 MS(plant)\/ xl 75 , 2 - r p Q* 

/21.025,5.9961 

38.941 


2341.94 


where r p = 5.9961 and Q* = 460.2638. Simplifying gives the interval, 0 < a 2 < 2431.94. 

Table 21.17 displays the confidence intervals computed using the confidence region 
method. 

All four of the above methods of computing confidence intervals are summarized in 
Table 21.18, where the entries in the tables are the lower and upper limits for the respective 
variance components. 


TABLE 21.17 


Summary of Confidence Intervals Computed Using the 
Confidence Region Method of Section 20.2.5 


Source 

Low v 

High v 

Low SD 

High SD 

Plant 

0.0000 

2341.94 

0.00000 

48.3936 

Worker(plant) 

0.0000 

164.00 

0.00000 

12.8061 

Worker x site(plant) 

17.9644 

58.74 

4.23845 

7.6641 


TABLE 21.18 


Summary of the Four Types of 95% Confidence Intervals, with Upper and Lower Limits 


Source 

REML 

General Satterthwaite 

Usual Satterthwaite 

Confidence Region 

Plant 

11.24 

12,100 

11.27 

5984.61 

10.56 

12079 

0 

2431.94 

Worker 

10.09 

271.39 

9.11 

213.00 

8.30 

327.87 

0 

164.00 

Worker x site 

17.48 

56.04 

18.16 

62.98 

18.80 

59.17 

17.96 

58.74 

Residual 

3.75 

6.94 

3.75 

6.94 

3.75 

6.94 

3.75 

6.94 
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21.6 Computations Using JMP 

Figure 21.1 contains a proportion of the JMP® data table for Example 21 where plant, site 
and worker are specified to be nominal variables. Figure 21.2 contains the fit model 
screen where plant, site(plant), zvorker(plant), and site x zvorker(plant) are specified as 
random effects and the REML method is selected. The estimates of the variance compo¬ 
nents for the full model are shown in Figure 21.3, including the estimated standard 
errors and 95% confidence intervals, which correspond to the REML estimates in Table 
21.2. The reduced model fit model screen is displayed in Figure 21.4, where the site(plant) 
term was removed from the fit model screen in Figure 21.2. The REML estimates from 
JMP in Figure 21.5 are identical to those from SAS-Mixed that were displayed 
in Table 21.11. The fit model process in JMP provides excellent results for random effects 
models. 





example_21 

▼ Columns (4/0) 


plant 

worker 

site 

efficiency 


1 

1 

1 

1 

100.6 


2 

1 

1 

1 

106.8 


3 

1 

1 

1 

100.6 


4 

1 

1 

2 

110 


5 

1 

1 

2 

105.8 


6 

1 

1 

3 

100 


7 

1 

1 

3 

102.5 


8 

1 

1 

3 

97.6 


ll. plant 
lli worker 
|L site 
^ efficiency 

r Rows 

All rows 118 

Selected 0 

9 

1 

1 

3 

98.7 


10 

1 

1 

3 

98.7 


11 

1 

1 

4 

98.2 


12 

1 

1 

4 

99.5 


13 

1 

2 

1 

92.3 


14 

1 

2 

1 

92 


15 

1 

2 

1 

97.2 


16 

1 

2 

1 

93.9 


17 

1 

2 

1 

93 


18 

1 

2 

2 

103.2 


19 

1 

2 

2 

100.5 


20 

1 

2 

2 

100.2 


21 

1 

2 

2 

97.7 









FIGURE 21.1 Part of the JMP table of data for Example 21. 
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S Fit Model 


' r Model Specification 

Select Columns 

da plant 
|L worker 
iLsrte 

^efficiency 


Pick Role Variables 


A efficiency 

options/ 

Weight 

| optional Numeric 

Freq J 

\ options! Numeric 

By 

| optional 


ydfx 


Personality: standard Least Squares v 
Emphasis: [ Effect Leverage v I 

Method: [ REML (Recommended) v | 

I I Unbounded Variance Components 
I I Estimate Only Variance Components 


s 


Remove 


Construct Model Effects 


Add 


Cross 


Nest 


][ 


plants Random 
jsite[plant]& Random 
jworker[plant]& Random 
isite*worker[plant]& Random 


Macros 


3 ! 


Degree |~2l 
Attributes tj 
T ransform ▼ 

□ No Intercept 


[ Run Model | 


FIGURE 21.2 JMP fit model screen for full model for Example 21. 


21.7 Concluding Remarks 

The confidence intervals obtained from the four methods are quite different, with the 
largest discrepancies occurring for variance components having a small number of degrees 
of freedom. The REML and usual Satterthwaite approximation are quite similar where the 
general Satterthwaite intervals using the type I sums of squares and confidence regions for 
the plant and worker variance components are quite variable. In this chapter, a complex 
unbalanced random effects model was analyzed in detail using all six methods of estima¬ 
tion described in Chapter 19 to estimate the variance components. The methods for testing 
hypotheses and constructing confidence intervals described in Chapter 20 were demon¬ 
strated. One of the variance components in the original model was found to be nonsigni¬ 
ficant, thus a model-building approach was used to provide an adequate model with 
meaningful estimates of the remaining variance components. 

At least two methods for estimating variance components and for constructing confi¬ 
dence intervals should be utilized to analyze an unbalanced data set in order to determine 
how much variability you may have between methods. If both methods yield similar 
results, then it will not make any difference which method is used and one can have confi¬ 
dence in either method. If the results are very different, then one should investigate what 
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0 example_21 - Fit Least Squares 


’ Response eficency 


▼ Whole Model 


' r Summary of Fit 


RSquare 

0.959994 

RSquare Adj 

0.959994 

Root Mean Square Error 

2,231998 

Mean of Response 

91.8678 

Observations (or Sum Wgts) 

118 

► Parameter Estimates 



^ Random Effect Predictions 


w REML Variance Component Estimates 


Random Effect 

Var Ratio 

Var Component 

Std Error 

95% Lower 

95% Upper 

Pd of Total 

plant 

10.134798 

50.489682 

63 088656 

11.244129 

12099.639 

44.607 

site(plant) 

0 

0 

0 

0 

0 

0.000 

worker[plant) 

5.810239 

28.945533 

21.280423 

10.086725 

271.39345 

25.573 

site*worker[plant] 

5.7751341 

28.770647 

8.3753065 

17.477538 

56.042651 

25.419 

Residual 

Total 


4.9818146 

113.18768 

0.7775504 

3.7505522 

6.9406261 

4.401 

100.000 


-2 LogLikelihood = 640.19159809 

► Iterations 

►| Residual by Predicted Plot 

j i i 


FIGURE 21.3 REML results for Example 21 from JMR 


0 Fit Model 


a 


▼ T Model Specification 

j- Select Columns 

d, plant 
ll. worker 
diSite 

^efficiency 



rn_.rt ivjic 

v err icikjic > 

m 

^efficiency 

optional 

| Weight | 

| optional Numeric 

[ Freq | 

optional Numeric 

1 B v 1 

optional 


Personalty: standard Least Squares v| 
Emphasis: Minimal Report v 

Method: REM l (Recommended) v 

l~l Unbounded Variance Components 
n Estimate Only Variance Components 

| Help | | Run Model | 


|Remove| 

Construct Model Effects 

| plant* Random 

_ worker[plant)8 Random 

[ Cross | site*worker[plant)8 Random 

1 Nest 1 

|Macros v | 

Degree Q 
Attributes T 
Transform t 1 
[~1 No Intercept 


FIGURE 21.4 JMP fit model screen for reduced model for Example 21. 
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0 example_21 - Fit Least Squaies 2 - □ X 


▼ Response eficency 

w Summary of Fit 

RSquare 0.959994 

RSquare Aclj 0.959994 

Root Mean Square Error 2.231998 

Mean of Response 91.8678 

Observations (or Sum Wgts) 118 

► Parameter Estimates 

► Random Effect Predictions 

▼ REML Variance Component Estimates 

Random Effect Var Ratio Var Component Std Error 95% Lower 95% Upper Pd of Total 


plant 10.134798 50.489682 63.088656 11.244129 12099.639 44.607 

worker[plant] 5.8102391 28.945534 21.280423 10.086725 271.39345 25.573 

site*worker[plant] 5,7751341 28.770647 8.3753065 17.477538 56.042651 25.419 

Residual 4.9818146 0.7775504 3.7505522 6.9406261 4.401 

Total 113.18768 100.000 

-2 LogLikelihood = 640.19159809 


FIGURE 21.5 JMP REML results for reduced model for Example 21. 


characteristics of the design are contributing to those differences. The results are described 
using both SAS-Mixed and the fit model process of JMP. 


21.8 Exercises 

21.1 A group of sheep producers designed a study to evaluate the sources variation 
in birth weights of lambs. Six producers were selected at random from the group 
of all producers. Each producer selected three sires (males) at random from their 
group of sires. Each sire was mated to two to six dams (females). Each dam 
produced either one, two, or three lambs and the birth weight of each lamb was 
measured. The data are in the following table where birthwtl-birthwt6 repre¬ 
sent data from producers 1 to 6. 

1) Write out a model that can be used to model the variation in this data set. 

2) Provide estimates of the variance components using the six methods dis¬ 
cussed in Chapter 19. 

3) Provide tests of hypotheses that each variance component is equal to zero 
(all except the residual). 

4) Provide confidence intervals using the REML, the recomputed intervals for 
type I sums of squares and the usual Satterthwaite approximation. 

5) Discuss the similarities and difference provided by the different estimation 
and confidence interval techniques. 
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Data Set for Exercise 21.1 


Sire 

Dam 

Lamb 

Birthwtl 

Birthwt2 

Birthwt3 

Birthwt4 

Birthwt5 

Birthwt6 

1 

1 

1 

19.10 

23.96 

17.67 

23.10 

21.67 

16.77 

1 

2 

1 

17.67 

20.84 

16.71 

20.14 

20.43 

19.38 

1 

2 

2 

20.15 

— 

16.75 

— 

— 

— 

1 

2 

3 

— 

— 

15.79 

— 

— 

— 

1 

3 

1 

18.67 

18.85 

16.45 

20.35 

22.76 

18.44 

1 

3 

2 

— 

— 

— 

19.83 

— 

— 

1 

3 

3 

— 

— 

— 

22.34 

— 

— 

1 

4 

1 

18.12 

20.61 

— 

— 

22.10 

23.62 

1 

4 

2 

17.98 

— 

— 

— 

— 

22.33 

1 

5 

1 

19.60 

— 

— 

— 

20.60 

16.65 

1 

5 

2 

18.18 

— 

— 

— 

— 

— 

1 

5 

3 

20.64 

— 

— 

— 

— 

— 

1 

6 

1 

— 

— 

— 

— 

— 

18.48 

2 

1 

1 

20.00 

22.07 

18.37 

22.50 

21.97 

21.47 

2 

1 

2 

— 

21.07 

— 

23.54 

— 

— 

2 

2 

1 

19.10 

23.32 

21.30 

19.68 

20.46 

22.38 

2 

2 

2 

— 

— 

— 

22.36 

— 

— 

2 

3 

1 

19.71 

23.52 

19.08 

20.09 

18.69 

21.99 

2 

3 

2 

— 

23.41 

— 

— 

18.29 

18.57 

2 

4 

1 

19.24 

21.30 

18.14 

22.74 

— 

— 

2 

4 

2 

— 

21.72 

— 

19.95 

— 

— 

2 

4 

3 

— 

21.91 

— 

— 

— 

— 

2 

5 

1 

17.18 

23.87 

20.32 

22.87 

— 

— 

3 

1 

1 

19.52 

21.65 

21.33 

19.98 

— 

18.80 

3 

1 

2 

— 

— 

— 

— 

— 

19.62 

3 

1 

3 

— 

— 

— 

— 

— 

20.68 

3 

2 

1 

19.62 

19.17 

20.45 

19.86 

— 

17.24 

3 

3 

1 

17.99 

19.84 

20.25 

— 

— 

19.88 

3 

4 

1 

19.15 

19.06 

22.01 

— 

— 

— 

3 

4 

2 

20.01 

19.53 

— 

— 

— 

— 

3 

5 

1 

19.90 

— 

22.58 

— 

— 

— 

3 

6 

1 

18.46 

— 

— 

— 

— 

— 


21.2 A feed manufacturer makes feed with a supplement mixed in and the amount of 
supplement should be 3.5 jig kg -1 . The manufacturer designed a study to evalu¬ 
ate the ability of the mixing process to mix the supplement into the feed. Three 
days were selected at random from the next 30 days of operation; three runs per 
day were selected from the 24 runs made per day, three batches were randomly 
selected from each run, and two to four samples were extracted from each batch. 
The amount of supplement within each sample was determined with the data 
displayed in the following table. 

1) Write out a model that can be used to model the variation in this data set. 

2) Provide estimates of the variance components using the six methods dis¬ 
cussed in Chapter 19. 

3) Provide tests of hypotheses that each variance component is equal to zero 
(all except the residual). 
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4) Provide confidence intervals using the REML, the recomputed intervals for 
type I sums of squares and the usual Satterthwaite approximation. 

5) Discuss the similarities and difference provided by the different estimation 
and confidence interval techniques. 

6) Use the model building technique to simplify the model and repeat parts 1-5 
for this reduced model. 


Drug Concentration Data for Exercise 21.2 Where Amt_J_/ Denotes Amount on Day i and Run; 


Batch 

Sample 

Amt_l_l 

Amt_l_2 

Amt_l_3 

Amt_2_l 

Amt_2_2 

Amt_2_3 

Amt_3_l 

Amt_3_2 

Amt_3_3 

1 

1 

3.28 

2.78 

3.08 

3.31 

3.35 

2.86 

3.32 

2.61 

3.68 

1 

2 

3.14 

2.88 

3.13 

3.38 

3.28 

2.64 

3.61 

3.24 

3.48 

1 

3 

3.42 

2.91 

3.14 

3.66 

3.43 

2.24 

— 

2.92 

— 

2 

1 

2.97 

2.87 

3.04 

3.60 

3.14 

2.74 

3.55 

2.59 

3.68 

2 

2 

3.30 

2.78 

2.68 

3.33 

3.09 

2.73 

3.66 

2.65 

3.78 

2 

3 

3.21 

— 

3.10 

— 

— 

2.40 

3.50 

2.93 

3.67 

2 

4 

3.30 

— 

2.97 

— 

— 

— 

3.44 

— 

— 

3 

1 

3.43 

3.01 

2.73 

3.42 

3.06 

2.75 

3.56 

2.91 

3.23 

3 

2 

3.17 

2.70 

2.90 

3.25 

3.29 

2.59 

3.35 

2.97 

3.43 

3 

3 

2.89 

2.78 

2.80 

3.60 

3.65 

2.51 

3.25 

3.12 

3.47 

3 

4 

— 

— 

2.93 

3.35 

3.15 

2.45 

— 

2.62 

— 






Analysis of Mixed Models 


Mixed models are used to describe data from experiments or studies that need more 
than one variance-covariance parameter and involve some fixed effects parameters. The 
unequal variance models of Chapter 2 are mixed models as they involve more than one 
variance component. The models described in Chapters 18-21 are called random effects 
models, but each has an unknown mean parameter, thus the models are essentially mixed 
models. The definition of a mixed model used in Chapter 18 revolved around having some 
of the treatment structure with fixed effects and more than one variance component. So 
the general definition of a mixed model is one with some fixed effect parameters and more 
than one parameter in the covariance structure. Among the models that are included 
in this definition are randomized complete blocks models, incomplete blocks models, 
split-plot-type models, strip-plot-type models, repeated measures type models, random 
coefficients models, multilevel models, and hierarchical models. 

The mixed model involves three parts: 1) the fixed effects part of the model, 2) the 
random effects part of the model, and 3) the residual part of the model. Consequently, the 
analysis of a mixed model consists of two types of analyses, an analysis of the random 
effects and residual parts of the model and an analysis of the fixed effects part of the 
model. This chapter discusses the construction of mixed models and the necessary steps 
for analyzing both the random effects part of the model and the fixed effects part of the 
model. The results of this chapter are provided as a bridge between theoretical results and 
an understanding of the concepts. 


22.1 Introduction to Mixed Models 

The model for describing an experiment with both random effects factors and fixed effects 
factors is called a mixed model. Since there are two types of factors, the resulting model 
has two parts, a random effects part and a fixed effects part. In order to construct such 
models, the following rule is used to determine whether a specific interaction is a random 
or a fixed effect. 
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Rule: If a main effect is a random effect, then any interaction involving that main effect 
is also a random effect. The only interactions that are fixed effects are those whose corre¬ 
sponding main effects are all fixed effects. 

For example, a model for a three-way treatment structure where the levels of A and B are 
fixed effects and the levels of C are random effects is 


Vijkm - h + a i + If + Ytj + Ck + d ik +fjk + gijk + £ ijkm 

where p denotes the mean response, a, denotes the effect of the zth level of fixed factor A, 
ft denotes the effect of the/th level of fixed factor B, y, denotes the interaction effect between 
the levels of A and the levels of B, c k denotes the effect of the /cth level of random factor C, 
d ik denotes the interaction between the levels of A and the levels of C,f jk denotes the inter¬ 
action between the levels of B and the levels of C, g ijk denotes the three-way interaction 
between the levels of A, B and C, and e ijkm denotes the residual effect. 

The fixed effects part of the model is p + a, + /3, + y.,, the random effects part of the model 
is c k + d ik +f jk + g , jk , and the residual part of the model is e ijkm . Simple assumptions about the 
terms of the random effects and residual parts of the model are c k ~ i.i.d. N( 0, of), d ik ~ i.i.d. 
N(0, off),f jk ~ i.i.d. N(0, of), g ijk ~ i.i.d. N(0,off), e ijkm ~ i.i.d. N(0,of) and c k , d ik ,f jk , g ijk , and e ijkm 
are all independent random variables. The interaction between A and B, denoted by y ; , 
is a fixed effect, while all other interactions are random effects since they all involve 
the random factor C denoted by the subscript k. In general, the mixed model will also 
involve terms corresponding to the design structure, but such terms are not included in 
the above model. 

The general linear mixed model in matrix notation is 

y — xp + Z 2 u 2 + Z 2 u 2 + * * * + Z k u k + £ 

where y is an observed Nxl data vector, xp is the fixed effects part of the model, 
ZjMj + Z 2 u 2 + • • • + Z k u k is the random effects part of the model and e is the residual part 
of the model. The ideal conditions are that u, ~ N( 0, ofI n ), i = 1,2, ... ,k; e~ N( 0, ofI N ) and 
m, (z = 1 , 2 ,..., k) and e are independent random variables. 

The conditional distribution of y given u u ii 2 ,... ,u k is represented by the fixed effects 
model y = Xp + Z l n l + Z 2 u 2 + • ■ ■ + Z k u k + e where e ~ N( 0, of l N ) and where zz „ u 2 ,.. ■, u k are 
fixed effects (because of the conditioning). The marginal distribution of y is y ~ N(Xp, £) 
where E= ofZ } Zf + ofZ,Zf + • • • + ofZ.fZl + ofl N , or equivalently y = xp + e where e~ 
N( 0,X). 

The population parameters of the mixed model are p,of,of,...,of, and of. The analysis 
of the random effects part of the model consists of estimating, testing hypotheses, and 
constructing confidence intervals about the variance components of, of,..., of, and of. The 
analysis of the fixed effects part of the model consists of estimating, testing hypotheses, 
and constructing confidence intervals about estimable functions of j3. This marginal dis¬ 
tribution of y is based on the assumption that the covariance matrices of the random 
effects are scalar multiples of identity matrices. A more general form of the mixed 
effects model is y = X/J + Zqz, + Z 2 u 2 + — 1 - Z k u k + e where the assumptions are that w, ~ 
N( 0, Gf), i = 1,2, ...,k,£~ N( 0, R) and zz, (z = 1,2,... ,k) and eare independent random variables. 
The marginal distribution of y can be stated in more general terms as y ~ N(Xp, X) where 
E= Z x GfZ\ + Z 2 G 2 Z 2 + — 1 - Z k G k Z' k + R, or equivalently as y = Xp + e where e ~ N(0, X). 

Ideally, the matrices G u G 2 ,... ,G k and R are positive definite and consist of the para¬ 
meters required to model the covariance structure of the random effects and of the residual 
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parts of the model. The analysis of the mixed model is presented in the next two sections, 
one section for each part of the analysis. The analysis of the random effects and residual 
parts of the model is considered first. 


22.2 Analysis of the Random Effects Part of the Mixed Model 

A random effects model can be constructed from the general linear mixed model by fitting 
the fixed effects part of the model first, and then computing the residuals of that model. 
The resulting model does not depend on the fixed effects part of the model. The general 
linear mixed model expressed as the marginal distribution of y is 

y = X/J + e where e ~ N( 0, Z) and E= a\ZfZ\ + o\Z 2 Z 2 + ■ ■ ■ + a^Z k Z' k + ajI N . 

The (ordinary) least squares estimator of ft is P = (X'X) X'y where (X'X) denotes a 
Moore-Penrose generalized inverse of X'X. The vector of residuals is 

r = y~ xp = (I- XX~)y 

A model for the vector of residuals (called the residual model) is 

r=(I- XX~ )Z 1 n 1 + {I- XX~)Z 2 u 2 + ••• + (!- XX~)Z k u k + {I- XX~)e 

The residual model does not depend on the fixed effects parameters or on the fixed effects 
part of the model, X/3, and thus the residual model is a random effects model. Methods 
for analyzing random effects models discussed in Chapters 18-20 can be used to analyze 
the residual random effects model. The four techniques, method of moments, maximum 
likelihood method, REML method, and the MINQUE method, are the topics of the next 
four subsections. In Chapter 23, the methods are demonstrated for two examples, one being 
a balanced data set and one an unbalanced data set. 

22.2.1 Method of Moments 

As described in Chapter 19 for the analysis of a random effects model, the method of 
moments technique requires computing sums of squares, determining their expectations, 
and then estimating the variance components from the system of equations obtained by 
equating the observed mean squares to their expected values. In order to estimate the 
variance components, sums of squares must be obtained whose expectations do not 
depend on the fixed effects part of the model. The method of fitting constants discussed in 
Section 18.3 provides such sums of squares when the fixed effects part of the model is 
fitted first, followed by the random effects. 

To demonstrate the analysis of the random effects part of the mixed model, consider a 
two-way treatment structure in a completely randomized design structure with one fixed 
factor, denoted by B, and the other random factor, denoted by T. The resulting two-way 
mixed model is 


y^V + Pi + tj + gij + Eijk i — \2,... ,b j — 1,2,...,t k=l,2 / ...,n ij 
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where it is assumed that f ( is distributed i.i.d. N(0, of), g is distributed i.i.d. N(0, a 2 ), and e ijk 
is distributed i.i.d. N(0, aj). The sums of squares obtained from the method of fitting con¬ 
stants are R(J3\p), R(t\p, fi), R(g\p, If t) and SSERROR. The expectations of the last three 
mean squares have the form 


E[MSR(t | p, p)\ = al + k k a 2 + k 2 a 2 e 
E[MSR{g\p,p,t)] = o] + k 3 Gl 


and 


E[MSERROR] = a] 

respectively for some constants k y k 2 , and k 3 . The values of k lr k 2 , and k 3 will depend on the 
sample sizes and the structure of the data. The expectations of these mean squares do not 
involve the fixed effects parameters and thus the mean squares can be used to estimate the 
variance components as well to test hypotheses about them. 

The system of equations is constructed by equating the observed mean squares to the 
expected mean squares where a\, a], and cry denote the solution; that is, 

MSR(t\p, P) = a 2 e + k 2 a 2 + k 2 d 2 e 
MSR(g\p, P,t) = d 2 e + k 3 a 2 


and 


MSERROR = a\ 


The solution is 


a 2 E = MSERROR 

a 2 g = [MSR(g\p, p, t) - dj]/k 3 


and 


a 2 = [MSR(t\p, P) - kff 2 - d 2 p/k 2 
The estimates of the variance components are 


~2 ^-2 

dr = <7 r 



if by > 0 

ifcT x <0 
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and 


O. = 


of if of > 0 
0 if of < 0 


Approximate (1 - a)100% confidence intervals about the variance components are 


df.,07 df.,af 

J a. r ^ _2 a; r , 

<o r < - 2 -, r = t,g,e 


Xa/2 ,df. 2 


Xl-ia/2 ),df. 2 


where 


df-2 — df MSERROR/ df 2 — 


(^ 2 ) 2 


[(l/k 3 )MSR( g\fi,p,t)] , [(1 /k 3 )MSERROR] 2 


df 


MSR(g\v,P,t) 


df M 


and 


d U: = 


(off 


[(l/k 3 )MSR(t\y,p)f | [(k,/k 3 k 2 )MSR(g\y,p,t)] 2 { [(l/k 2 )(l - (k,/k 3 ))MSERRORf 


df 


MSR(t\n,P) 


df 


MSR(gM,t) 


df M 


The expected means squares can be used to construct tests of hypotheses about the 
variance components in the model. The statistic to test H 0 : a, 2 = 0 vs H a : <r t 2 > 0 is 


MSR(t \n,p) 

Q 


where Q 


^MSR(g\n,p,t) + [\ 

o V 



MSERROR 


which has approximate sampling distribution of F with d/ MSR(f ^ ^ numerator degrees of 
freedom and v denominator degrees of freedom where 


v = 


Q 2 


[(k 1 /k 3 )MSR(g\fi,p ,t)] 

dfMSR(g\n,P ,t) 


2 | [{l-jk^k^MSERRORf 

df MSERROR 


as determined by a Satterthwaite approximation. 
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The statistic to test H 0 : a 2 = 0 vs H a : a 2 > 0 is 

MSR(g\n,p,t) 
t S c MSERROR 

which has sampling distribution of F with df MSR ^ RM numerator degrees of freedom and 
df mserror denominator degrees of freedom. 


22.2.2 Method of Maximum Likelihood 

The method of maximum likelihood can be applied to the complete likelihood function, 
denoted by L( /J, a 2 , a 2 ,..., of of) = (2n)~ n/2 \ X\ 1/2 exp [-/(j/ - X/3)'Z '(y - XfJ)]. Maximizing 
this likelihood function or equivalently minimizing -21og[L()3, a\,a\, ..., 0 2 , of)] with 
respect to all of the parameters provides equations for simultaneously estimating both the 
fixed parameters and the random effects parameters (Hartley and Rao, 1967). The required 
equations are 


jn_ 

dp 


f}=P,a=& 


= o —- 

' da] 


P=p,<y=a 


= o —- 

' da; 


P=P,g=g 


= 0 —— 

da k 


p=p,o=& 


= 0 

' ^ 


= 0 


P=P,<7=& 


,ala 2 £ 


, and a' = [a\, a\, ..., of, of] 


where / = -2 log[L(/?, erf, of, ..., a\, a})\, a' = [a], a 
The two-way mixed model with equal numbers of observations is used to demonstrate 
the computation of the maximum likelihood estimators. The model can be expressed as 


y ijk = Pi + a j + g ij + £, jk , i = 1,2,... ,f, / = 1,2, ... ,a, k = \,2,...,n 

where p, denotes the mean of the /th level of the fixed effect factor T, «• denotes the effect of 
the /th level of the random effect factor A, g u denotes the random interaction effect, and e jjk 
denotes the residual effect. Under the ideal conditions, a, ~ i.i.d. N( 0, a 2 ), g n ~ i.i.d. N( 0, aj), 
Ejj k ~ i.i.d. N( 0, a 2 ), and the a jr g tj and e ijk are all independent random variables. The variance 
of the data vector is 


z= am,, JA + of[/„ ® ® IA + o%!» IA 

= (a 2 e + n a 2 + nta 2 )[jf J n ® I a ® j /] + (of + na 2 )[~ J n ®I a ®(l t - j /,)] 

= k l V 1 + AjFj + A 3 V 3 (say) 

where X, = o'; + n of + nta 2 , A, = o'; + no; and A 3 = o'; and V v V 2 , and V 3 are idempotent and 
pairwise orthogonal. The notation denotes the direct product of matrices A and B 

(Graybill 1976). Thus, \£\ = AJAf^Af’" 1 ’ and 2T 1 = (1/A,)V, + (1/A 2 )V 2 + (1/A 3 )V 3 . The value of 
l = -2log(L) can be expressed as l = ntalog(2n) + fllog^) + a(t - l)log(A 2 ) + at(n - l)log(A 3 ) + 
Q where Q=[y~ (j n ®j a ® )p\'X,~\y ~ (}„ ®ja ® With some algebra and using the 
above relationships, the value of Q can be expressed as 


Q = WAy + 3 -y'Ay + y'A.'/ + 7 -««%... - y ,) 2 + y^(y,.. - y... + y. - y,) 


K 


K 


At? 
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where 


y'A x y = y' ® (/„--/„) ®jJ, V = SSA 
n V a J t 

y'A 2 y = y' ±J n ® fl a - *j) ® fl f - jJ t ) y = SSTA 


y'A 3 y = y' I n - /„ ® I . ® I t y = SSERROR 
V n ) 


The value of -2 log(likelihood) can be expressed as 

l = nta log(27r) + fllog(T 1 ) + a(t - l)log(A 2 ) + at(n - l)log(A 3 ) 

+—SSERROR + —SSA + —SSTA + —nat(y - jJ ) 2 
a. A, Aj 

na -tfy _ _ 2 

+ ~v- + <“• -aO • 

A-2 i=l 

Next differentiate / with respect to A,, An, A 3 , p„ and ( p t - p .), and set the derivatives equal 
to zero to provide the system of equations: 


at(n- 1) SSERROR n r 2 SSERROR „ 2 


■ = 0 => A, = 


at(n - 1) 


a(n - 1) SSTA n - 2 SSTA 
= 0 => A, = 


|,HU=A 

dl 

dl 

Th 11 a A 


a(n - 1) 


fl SSA c 2 SSA 

= - — + —— = 0 => Af =- 

A, Af « 

-2nat(y... -p...) ~ 

= -r 2 - = 0 =>p...=y... 


-Inaiy,.. - y... + (fi f - p.)) 


dPi ~ P. 


= 0 => (fi, - P.) = y t .. - y... 


i r*» \p=fi,X=X 


or Pi = %■ 

Using the fact that Aj = (J 2 e + no 2 g + ntaj, X 2 = (7f : + nol and A 3 = aj, then the maximum 
likelihood estimators of the variance components are: 


A 2 — A 3 




if i>i , A, 


3 , and a] = \ n t 


if A, > A 2 


0 if A 2 < A 3 


[0 if Aj < A 2 
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Note that a\ is an unbiased estimator of a) while 6\ and d\ are biased estimators of a'- 
and a 2 , respectively. 

22.2.3 Method of Residual Maximum Likelihood 

Corbeil and Searle (1976) expressed the likelihood function in two parts, one involving the 
fixed effects and one free of the fixed effects. They then obtained maximum likelihood 
estimators of the variance components from that part of the model free of the fixed effects, 
which they call restricted maximum likelihood estimators, or REML estimators. To illus¬ 
trate, consider again the general linear mixed model y = Xfi + Z 1 u 1 + Z 2 u 2 + • • • + Z k u k + e 
with the usual assumptions given in Section 22.1. Assume that the rank of X is equal to p. 
To construct the restricted likelihood function, let Hbe an N x (N - p) matrix of rank N-p 
such that HH' = (I - XX”). Define the transformation 



where 


X' - 

H' 

is an NxN nonsingular matrix. Thus the transformation from y to z is a one-to-one trans¬ 
formation. The distribution of 


2 = 


*1 

Z 2 


is 


*i 

Z 2 


T x'xp 


X'XX X'XHl 

0 

/ 

H'XX H'XH] 


The joint likelihood function of 


2 = 


«1 

Z 2 


can be partitioned into the likelihood of z 1 given 2 2 times the marginal likelihood of 2 2 . 
The marginal likelihood of 2 2 does not depend on the fixed effect parameters of the model, 
but does depend on the variance components. The marginal likelihood of 2 2 is 


L h ( 22 ) = L H (a 2 y <t z . o 2 k , c\, = (2n )-W | H'ZHr exp[-i iyH(H'IH ) 1 H'y] 
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The REML estimators of the variance components are the values of a\, a \,.,., of and 
a\ that maximize L H (a 2 v o\,...,a 1, a].). This restricted likelihood function is for the 
random vector z 2 . But the residuals of the fixed effects model (see beginning of Section 
22.2) can be expressed as r = Hz 2 , a transformation of z 2 . So the restricted likelihood func¬ 
tion utilizes the information from the residual model that does not depend on the fixed 
effects, or the restricted likelihood is a function of the residuals of the model. Hence, the 
name selected for this method is residual maximum likelihood. The process of maximiz¬ 
ing l R = log [L H (cr{, (7 j , ■■■ ,a 2 k , (7:.)] yields a set of equations that needs to be solved. There 
is no guarantee that the solution of the set of equations provides a solution set that is in 
the parameter space. Again, it is important to obtain non-negative estimates of the vari¬ 
ance components since these estimates are used in the analysis of the fixed effects part of 
the model. 

On revisiting the balanced two-way mixed model described in Section 22.2.2, the like¬ 
lihood function can be expressed as 


L{n,K = 


((2 n) vl Af 2 r'expf-^V.-zO 2 

V 2yL l j 


x exp 


x exp 


na -v-i_ ,2 

^2 t=l 


x [(2 nf~ l)n At l)n V 
x[(2^) (B,, ‘-‘ )/2 A< a - 1)/2 A^- 1)(( - 1)/2 Af n - 1)/2 ]' 


SSERROR SSA SSTA 

- . -+ —-— + —-— 


■ V ,w 3 ^1 ”2 J 

- HP. ) x L(h, - P. ) x L(Aj,A,,A 3 ) (say) 


The REML estimates of A,, A 2 , and A 3 are obtained by minimizing -2 log[L(A 1 , A 2 , A 3 )] = l R 
which is accomplished by differentiating l R with respect to A„ A 2 , and A 3 and setting the 
derivatives equal to zero. The derivatives and solutions are: 


Mr 

rU, 


A = A 


(fl-l) 

K 


SSA 

Af 


= 0 => Aj 


SSA 
a - 1 


Mr 

3A 2 

Mr 

3A, 


(a - 1 )(t -1) SSTA „ r SSTA 

-- k 2 ---X- = 0 => An = - 

a 2 A 2 (fl - l)(f -1) 

at(n - 1) SSERROR n : SSERROR 

—-A---x-= 0 => A, =- 

A 3 A 3 ‘ at(n - 1) 


The estimates of the variance components are computed as 


<7 2 = A, 



n 


0 


if A, > A, 
if A, < A, 


and a 2 



0 


if A, > A ; 
if Aj < A 


The REML estimates are unbiased estimates of the variance components and they are 
identical to those one would obtain using the method of moments. 
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22.2.4 MINQUE Method 

The MINQUE method for a random model is described in Section 19.3. In that application, 
the mean of the model is j n p while the mean of this mixed model is XjS. The estimator can 
be generalized to the general linear mixed model (Swallow and Searle, 1978) where the 
MINQUE estimator of 

a = {o\, o\, af' is 6 = S l q where the matrix S has elements 

s, r = tr[Z,Z'_RZ,Z'], i, T = 1,2,..., k + 1 where i = k+ 1 corresponds to e and Z k+1 = I N , 

R = Z-%, - X(X'Z ] X) X']Z- ] 

and q has elements q t = y'RZZ'Ry, i = 1,2,..., k + 1. 

The solution depends on the elements of Z, which for a MINQUE solution are generally 
selected to be 1 for variances and 0 for covariances. If the model is balanced, the solution 
generally does not depend on the values selected for Z and the process converges in one 
iteration. There is no guarantee that the solution to the system provides values in the 
parameter space. Be sure to use a solution that provides non-negative estimators, since 
the estimators need to be used in analyzing the fixed part of the model. One-iteration 
MINQUE estimators are implemented in some software packages (called zero iteration) 
and are computed for the two examples to be discussed in Chapter 23. 


22.3 Analysis of the Fixed Effects Part of the Model 

The analysis of the fixed effects part of a mixed model consists of all aspects of the analysis 
of a fixed effects part of the model as described below. 


22.3.1 Estimation 

There are several methods for estimating estimable functions of p in the mixed model. The 
general linear mixed model can be expressed as 

y = xp + e where Var(e) = Z 

and 

Z = o\ Z]-Z 2 + O', + ■■■ + <J 2 k Z k Z k + (T 2 I n 

A linear combination, a'P is estimable for this mixed model if and only if there exists a 
vector c such that E(c'y) = a 'p. This definition is the same as that for the general linear 
model (see Chapter 6). The estimate of a'P, an estimable function of P, is a’P where $ is any 
estimator of p. 
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The ordinary least squares estimator of a 'p is a As where As = (X'XfX'y or some other 
solution for p to the normal equations X'Xp = X'y. The least squares estimator of P does 
not depend on the covariance matrix of y; that is, it does not depend on X,. 

If the elements of X are known (that is, if a \, a\,..., a and a] are known), the best linear 
unbiased estimator (BLUE) of a'P is a'p BLUE where $ BLUE = (X'X'XfX'X'y or any other 
solution for p BLUE in X'X 'Xp nlUF = X'X'y. 

For most balanced designs and for some simple unbalanced designs $ BLUE = As- Thus for 
these designs, the BLUE of a'pis a 'As where As = (X'XfX'y, which does not depend on 
the variance components. 

When the designs are unbalanced and the variance components are unknown, life is 
not so easy. Because the BLUE does not exist (since it depends on the unknown variances), 
a weighted least squares estimator must be obtained where X is used as the weighting 
matrix. The estimated covariance matrix is 

X=a\ z,z; + a\ Z 2 Z' + • • • + a\ Z k Z’ k + ajl N 

where 6\, 6\, .... 61, and 6\ are the estimators of the variance components obtained 
using one of the methods discussed in Section 22.2. The weighted least squares estima¬ 
tor of a'P or estimated BLUE (EBLUE) of a'P is a'P w where fi w = (X'X^XfX' X 'y or 
some other solution for Pw in X'X- 1 X$ W = X'X l y. For most designs, a'p w converges to 
a'P as the sample size increases. For convergence, some care must be taken so that as the 
sample size increases, the number of parameters does not go to infinity. The large sam¬ 
ple variance of a'p w is equal to Var {a'p w ) = a'(X'X 'X)a. This approximation to the vari¬ 
ance of «'Av does not take into account the variability in the estimates of the variance 
components. Kackar and Harville (1984) showed that a'(X'X 1 X) a is too small and 
needs to be adjusted for the fact that the true values of the variance components are 
replaced by their respective estimates. Kackar and Harville (1984) and Kenward and 
Roger (1997) use a Taylor series expansion about the unknown variance components to 
provide an adjustment to the estimated standard errors of the fixed effects (that approx¬ 
imation is beyond the scope of this text). Using the DDFM = KR option in SAS-Mixed 
provides the adjusted estimated standard error for the estimates of the fixed effects 
parameters. The option also estimates the degrees of freedom to associate with a stan¬ 
dard error by applying Satterthwaite's method to the KR adjusted standard error 
estimate. 

The method of maximum likelihood can also be used to obtain an estimator of a'P- 
The maximum likelihood estimate of a'P is a'p ML where p ML is a solution to the unre¬ 
stricted likelihood equations. The variance of a'p ML is a'Wa where W is the partition of 
the generalized inverse of the information matrix corresponding to p M£ . The examples in 
Chapter 23 demonstrate the above estimators for both balanced and an unbalanced 
designs. 


22.3.2 Construction of Confidence Intervals 

A (1 - a)100% confidence about an estimable function of a 'p where a 'P is an estimable 
function of P is obtained by using the asymptotic sampling distribution of a'p w which is 
a'p w ~N[a'P, a'(X'X X) a\. The estimate of the standard error of a'P w is sle(a'p w ) = 
^l[a'(X'X~ 1 X)~a\. A Kackar-Harville type of adjustment to the estimated standard error 
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should be used at this point. Approximate degrees of freedom are obtained by using the 
generalization of the Satterthwaite approximation (Geisbrecht and Burns, 1985) as 

~~ Var [a'{X'£- l X)-a\ 

Thus, an approximate (1 - a)100% confidence interval about a'p is 

a% v ±(t a/2 , v )f[a'(X'£-'X)-a] 

A simultaneous set of confidence intervals can be constructed about a set of estimable 
linear combinations a[p,a' 2 p r ...,a; n p, but the degrees of freedom need to be approximated 
for each linear combination. Adjustments for multiple comparisons can then be made 
using the techniques of Chapter 3. 


22.3.3 Testing Hypotheses 

Often a test of hypothesis provides an appropriate inference procedure for linear combi¬ 
nations of the fixed effects. To test H 0 : H/3 =bvs H a : HP + b compute the statistic 

Q = (Hp w -by [H(X'Z 1 A) H'] ] (. Hp w - b) 

Under the conditions of the null hypothesis, the asymptotic sampling distribution of Q is yf 
where q = Rank(H); that is, q is the number of linear independent linear combinations of p in 
H. A small sample tests statistic is F = Q/v where v is the approximate degrees of free¬ 
dom associated with H(X'E 1 X)~H'. The approximate degrees of freedom (SAS Institute, 
Inc., 1999) are computed by performing a spectral decomposition on H(X'XX)fT as 
H(X'E X) H' = P'AP where A is the q x q diagonal matrix with the characteristic roots of 
H(X'E X) H' on the diagonal and P is the q x q matrix of corresponding characteristic 
vectors. Let h s be the sth row of PH, then v s = 2tf/(c,[flc, ; ) where 

d[h' s (X'^X)-h s ] 

da 

(the vector of derivatives with respect to each of the variance components) and Q. is the 
asymptotic covariance matrix of the estimates of the variance components, a. Then 



In(v, > 2) 


where the indicator function In(v s > 2) deletes terms where v s < 2. Then the approximate 
number of denominator degrees of freedom associated with F c is 


v = 


2g 


0 


if ? > ^ 

ifg > q 


Examples are discussed in Chapter 23. 



Analysis of Mixed Models 


397 


22.4 Best Linear Unbiased Prediction 

There are some situations where it is of interest to predict the value of a random variable or 
to predict the values of the random effects used in a study. Suppose you have the linear 
model y = X/J + e where Var(e) = Z. Suppose there is a random variable ft) whose value is not 
known, but it is assumed that ft) ~ N(k'P, a 2 J and Cov(y, (d) = c. The object is to predict the 
value of ft). The predicted value of ft) is called a best linear unbiased predictor (BLUP) when 


1) ft)= a'y + b ; 

2) £(<y) = k'p; and 

3) E(ft)-ft)) 2 is minimized. 

The resulting BLUP of ft) is ft) = c'Z (y - X0 BLUE ) + k'0 BLUE . When the elements of Z are not 
known and need to be estimated, then the estimated BLUP (EBLUP) of ft) is 

ft) - c'Z 1 fry - X0 W ) + k'P w . If k'p = 0, then ft) = c'Z 1 (y - xfi w ) 

For the general linear mixed model, y = X/J + Zu + e where u ~ N( 0, G) and e ~ N( 0, R), the 
covariance between y and u is Cov(i/, u) = GZ' and the BLUP of u is u = GZ’Z l (y - Xp). 


22.5 Mixed Model Equations 

Henderson (1984) developed the mixed model equations where the solution simultaneously 
yields the BLUE of estimable functions of /J, a'P, and best linear unbiased predictors of the 
random effects u = (u j, u’ v ...,uf)'. The mixed model is expressed as y = X/J + Zu + £ where 


Zu = Z 1 u 1 + Z 2 u 2 + ■ ■ ■ + Z k u k 
u, ~ N(Q, <? 2 I n ), i = 1,2,...,k 
e ~ N( 0, <r 2 e I N ) 


where u v u 2 ,... ,u k , e are independent random variables. 
Note that 


where 


Ml 

u 2 


N(0,G) 


a-I ni 0 - 0 

0 a\l ni ■■■ 0 


0 


0 
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Using these assumptions, the marginal distribution of y can be expressed as 

y~N(Xp,Z) 

where 

k 

Var (y) = Z = Xo- 2 Z,Z' + a] l = ZGZ' = (T 2 £ I n 

i=1 1 

This model with its assumptions implies that Cov(y, n) = GZ'. 

Using the above information, the joint distribution of y and u is 


u 

~N 

" 0 “ 


" G 

gz'T 

y_ 

*P_ 

/ 

ZG' 



The conditional distribution of y given u is y\u ~ N(Xp + Zu, cr 2 I) and the marginal dis¬ 
tribution of u is u ~ N( 0, G). Thus the joint distribution of y and u can be expressed as the 
product of the conditional distribution of y given u and the marginal distribution of u, as 


h(y,u) =f(y | u)g(u) 

= [27T(7 2 ]~ (N/2) exp —~ (y - X/3 - Zu)' (y - X/5 - Zu) [27r] _( ? /2) | G | exp(—iwG _1 w) 


where q = n x + n 2 + • • • + n k . 

Henderson (1984) differentiated -2 log[/(y | u)g(u)] with respect to P and u to derive the 
mixed model equations whose solution provides the BLUE of estimable functions of /} and 
BLUPs of u as: 


-Hog[f(y\u)g(u)\ = (N + #og(27r) + Mog(( 7 2 ) + log | G | 

+ ~{y - XP - Zu)' ( y-xp - Zu) + u'G 'u 

= h (say) 

~ = X'(y -Xp- Zu), r ^- = -% > Z\y -Xp- Zu) + 2G hi 
dp a\ du a] 

Setting the derivatives equal to zero provides the mixed model equations. 


"X'X 

X'Z 

Pblue 


X'y 

Z'X 

(.zz + o;G l ) 

U 


Z'u 


If the variance of e (the residual) is R then the mixed model equations become 


X'R^X 

X'R l Z 

/^blue"| 


X'Rfy 

Z'R'X 

(Z'R'Z + G _1 ) 

u 


Z'Ru 
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The solutions to the mixed models equations are 

p = (X'E 'X) X’E ' y and u= GZ'E~\y - Xp) 


In general the elements of E are not known and need to be estimated, thus the estimated 
BLUE (EBLUE) and the estimated BLUP (EBLUP) are 

P w = (X'E 'xyx't ] y and u = GZ'E \y - X/J), respectively. 


22.6 Concluding Remarks 

This chapter presents a theoretical development of a mixed model to describe experiments 
where the factors in the treatment and design structures involve both fixed and random 
effects. The analysis of the mixed model involves estimating and making inferences about 
the variance components and functions of the fixed effect parameters. As for the random 
effects part of the model, tests of hypotheses can be carried out using a sums of squares 
method rather than using the asymptotic sampling distribution of the estimates of the 
variance components. The analysis of the fixed effects part of the model was examined in 
order to make inferences about the fixed effects. Confidence intervals and tests of hypo¬ 
theses can be carried out using the asymptotic sampling distributions of the estimates of 
estimable functions or approximate F-statistics where the denominator degrees of freedom 
are determined via a Satterthwaite type approximation. Best linear unbiased predictions 
of random effects are discussed as well as the mixed model equations. 


22.7 Exercises 

22.1 For the model used to describe data from a one-way treatment structure in a 
randomized complete block design structure, 

yij = Hi + bj + Ejj, i = 1,2,..., t, j = 1,2,..., b, b r i.i.d. N{ 0, c 2 b[k ), £ tj ~ i.i.d. N( 0, a]) 

1) Determine the ML estimates of the model's parameters. 

2) Determine the REML estimates of the model's parameters. 

3) Determine the BLUE of /i, the vector of treatment means. 

4) Determine the BLUP of b, the vector of block effects. 

22.2 Five school districts were carrying out a study to evaluate the effectiveness of 
elementary school math teachers. Within each district, three elementary schools 
were randomly selected. Within each of the selected elementary schools, four 
teachers were randomly selected. Each teacher was given a questionnaire and 
the response was the total number of positive answers. Assume the districts are 
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fixed effects, the schools within a district are random effects and the teachers 
within a school are random effects. 

1) Write out the model with the assumptions. 

2) Obtain the BLUEs of the district means. 

3) Obtain the REML estimates of the variance components. 

4) Obtain the BLUP of the school effects and of the teacher effects. 
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Chapter 22 discussed methods for analyzing data for balanced and unbalanced mixed 
models. This chapter presents detailed analyses using examples for each situation. The 
data for the unbalanced case is obtained by randomly deleting some observations from the 
data for the balanced situation. The study involved a company wanting to replace machines 
used to make a certain component in one of its factories. Three different brands of machines 
were available, so the management designed an experiment to evaluate the productivity of 
the three machines when operated by the company's own personnel. Six employees (per¬ 
sons) were randomly selected from the population of employees that are trained to operate 
such machines. Each selected employee was required to operate each machine during three 
different shifts. The data recorded were overall productivity scores that took into account 
the number and quality of components produced. The data are given in Table 23.1. 


23.1 Two-Way Mixed Model 

The treatment structure for this experiment is a two-way with machines being a fixed 
effect and persons being a random effect. The design structure is a completely randomized 
design. The two-way mixed model used to describe data from b persons each operating 
the t machines during n different shifts is 

ij ijk = lu+T l + p j + (Tp) ij + s ljk , i = 1,2,... ,f, j = 1,2,... ,b, k = l,2,...,n 

where p denotes the average productivity score of the set of brands of machines as operated 
by the population of workers, r, denotes the effect of the /th machine on productivity scores, 
Pj denotes the random effect of the/th person on productivity scores, and (? p) tj denotes the 
random interaction effect on productivity scores that is specific to the /th person operating 
the /th machine, and e ljk denotes the random error term associated with the /cth time the j th 
person operates the /th machine. 
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TABLE 23.1 


Productivity Scores for Machine-Person Example 


Machine Person 

Data for Balanced Case 
(Section 23.1) 

Data for Unbalanced Case 
(Section 23.2) 

Rate_l 

Rate_2 

Rate_3 

Mrate_l 

Mrate_2 

Mrate_3 

1 

1 

52.0 

52.8 

53.1 

52.0 

— 

— 

1 

2 

51.8 

52.8 

53.1 

51.8 

52.8 

— 

1 

3 

60.0 

60.2 

58.4 

60.0 

— 

— 

1 

4 

51.1 

52.3 

50.3 

51.1 

52.3 

— 

1 

5 

50.9 

51.8 

51.4 

50.9 

51.8 

51.4 

1 

6 

46.4 

44.8 

49.2 

46.4 

44.8 

49.2 

2 

1 

62.1 

62.6 

64.0 

— 

— 

64.0 

2 

2 

59.7 

60.0 

59.0 

59.7 

60.0 

59.0 

2 

3 

68.6 

65.8 

69.7 

68.6 

65.8 

— 

2 

4 

63.2 

62.8 

62.2 

63.2 

62.8 

62.2 

2 

5 

64.8 

65.0 

65.4 

64.8 

65.0 

— 

2 

6 

43.7 

44.2 

43.0 

43.7 

44.2 

43.0 

3 

1 

67.5 

67.2 

66.9 

67.5 

67.2 

66.9 

3 

2 

61.5 

61.7 

62.3 

61.5 

61.7 

62.3 

3 

3 

70.8 

70.6 

71.0 

70.8 

70.6 

71.0 

3 

4 

64.1 

66.2 

64.0 

64.1 

66.2 

64.0 

3 

5 

72.1 

72.0 

71.1 

72.1 

72.0 

71.1 

3 

6 

62.0 

61.4 

60.5 

62.0 

61.4 

60.5 


The additional assumptions about the random variables in this model are 


Pi ~ LLd - N (0, a 2 per J 
(vpli ~ i-i-d. N(0, a 2 mxp ) 
s ijk ~ i.i.d. N(0, of) 


and the p j: (rp) ;; and e ijk are all independent random variables. 

Assuming e ljk ~ i.i.d. N( 0, er 2 ) implies the time intervals between the different shifts when 
measurements are made on a person x machine combination are long enough so that the 
error terms are uncorrelated. 

The first step is to analyze the random effects part of the model. SAS®-Mixed code was 
used to obtain the method-of-moments, REML, maximum likelihood, and MINQUEO 
estimates of the three variance components. The type III sums of squares, mean squares, 
and their corresponding expected mean squares are shown in Table 23.2. The maximum 
likelihood, REML, method of moments and MINQUEO estimators of the variance compo¬ 
nents are given in Table 23.3. The SAS-Mixed code in Table 23.4 was used to fit the two-way 
mixed model and obtain the REML estimates of the variance components. The estimates 
from the other three methods were obtained by specifying Method = ML, Method = 
MIVQUEO and Method = type3. 
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TABLE 23.2 


Analysis of Variance Table for the Balanced Data Using Type III Sums of Squares 


Source 

df 

Sum of 
Squares 

Mean Square 

Expected Mean Square 

F-Value 

Pr > F 

Machine 

2 

1755.263333 

877.631667 

’Vai(Residual) + 3 Var(person x machine) + 
Q(machine) 

20.58 

0.0003 

Person 

5 

1241.895000 

248.379000 

’Var(Residual) + 3 Var(pcrso); x machine) + 
9 Var(person) 

5.82 

0.0089 

Person x machine 

10 

426.530000 

42.653000 

Var(Residual) + 3 Var (person x machine) 

46.13 

<0.0001 

Residual 

36 

33.286667 

0.924630 

Var(Residual) 

— 

— 


TABLE 23.3 


Estimates of the Variance Components Using REML, ML, 
MIVQUEO, and Method of Moments (type III) 


Covariance Parameter 

REML 

ML 

MIVQUEO Type III 

Person 

22.8584 

19.0487 

22.8584 

22.8584 

Person x machine 

13.9095 

11.5398 

13.9095 

13.9095 

Residual 

0.9246 

0.9246 

0.9246 

0.9246 


TABLE 23.4 

SAS-Mixed Code to Obtain REML Estimates of the 
Variance Components for the Balanced Data Set 

proc mixed method=reml cl covtest data=ex23bal; 
class person machine; 
model rating=machine/DDFM=KR; 
random person person*machine; 

LSMEANS MACHINE/ diff; 


The first hypothesis to be tested is H 0 : of x:p = 0 vs H a : > 0. The statistic to test this 

hypothesis, constructed by using the expected mean squares in Table 23.2, is 

MSPerson x Machine 
mxp ~ MSResidual “ 

The significance level associated with this F test is less than 0.0001, indicating there is 
strong evidence that op nXf , is nonzero and is an important contributor to the variability in 
the data. If the machine x person interaction variance component was equal to zero, it would 
indicate that productivity differences among all employees (not just those in the sample) 
are similar for the three machines. It would also indicate that the productivity differences 
among the three machines are similar for all employees (not just those in the sample). 
In other words, the inferences are to the populations of all possible employees and not just 
to those employees that were randomly selected from all of the company's employees. 
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The interpretation of a significant machine x person interaction variance component is that 
productivity differences among machines vary depending upon the person using the 
machine. One interpretation of a large machine by person interaction variance component 
is that some persons are better adapted to some machines than to other machines. In the 
data, one can see that person 6 does not perform as well with machines 1 and 2 as the other 
persons in the sample, but person 6 does about as well as the other persons in the sample 
on machine 3. The second hypothesis to be tested is H 0 : a 1 - = 0 vs H a : af, > 0. Again, the 
statistic to test this hypothesis is constructed by using the expected mean squares in 
Table 23.2 and is given by 


_ MSPerson 
r MSPerson x Machine 

The significance level associated with H 0 : (7 p = 0 vs H a : a'f, >0 is 0.0089, indicating that there 
is considerable variation in productivity scores among employees at the plant. A training 
program may decrease the variability among the employees. 

The estimates of the variance components from the four methods are shown in 
Table 23.3. the estimates of the variances components are identical for REML, MIVQUE0, 
and type III, but those from ML are a little smaller for the person and machine x person 
variance components. 

The Satterthwaite approximation can be used to determine the degrees of freedom to use 
when constructing confidence intervals about the variance components. The first step is to 
express the estimate of a variance component as a linear combination of the mean squares 
in Table 23.2. The method of moments estimate of (jj, is 6 p = | MSPerson - \ MSPerson x 
Machine = 22.8584. The number of degrees of freedom of the approximating chi-square 
distribution obtained through the Satterthwaite approximation is 


v = 


(°;f 


[} MSPerson ] 2 


MSPerson x Machine ] 2 


= 3.38035 


5 10 

The percentage points for a 95% confidence interval are 

X. 025 , 3.38 = 10.0467 and ,£, 975 , 3.38 = 0.30725 

Thus, an approximate 95% confidence interval about a 1 - is 

3.38035(22.8584) 2 3.38035(22.8584) 

10.0467 - c v - 0.30725 


or 


7.69102 < a) < 251.486 

A 95% confidence interval about o p is 2.773 < < 7 „ < 15.858. 
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A confidence interval for cr 2 x/ , can be obtained in a similar manner where the approximate 
number of degrees of freedom is 9.56989 and the 95% confidence interval is obtained as 

6.70314 < cr„ 2 !Xp < 44.2384 

The estimates of the covariance parameters from the type III analysis are given in Table 
23.5. The estimates of the variance components and estimates of the standard errors are in 
the first two columns. The Z-value column gives the ratio of each estimate to its corre¬ 
sponding estimated standard error and the Pr Z column is the two-sided significance level 
corresponding to the computed Z-value. The Lower and Upper columns are confidence 
intervals computed using the Wald interval; that is, cr ± Z 0 025 [‘Q'.(<7 2 )]. The Wald confi¬ 
dence intervals are only appropriate when the number of degrees of freedom associated 
with the estimated variance component is large. The confidence interval about cr 2 (resid¬ 
ual) is based on the chi-square distribution with 36 degrees of freedom. Table 23.6 contains 
the estimates of the covariance parameters from the REML analysis. 

The estimates of the variance components and their estimated standard errors in 
Tables 23.5 and 23.6 are identical, but the Pr Z columns are different. The significance levels 
in Table 23.6 are one-sided (as they should be). The computed number of degrees of free¬ 
dom, df= 2 (Z-value) 2 , are the same as those computed above for the method of moments 
estimators. The confidence interval about the residual variance component is based on the 
chi-square distribution using 36 degrees of freedom. Therefore, for balanced designs, 
the information about the variance components from REML, MIVQUEO and method of 
moments is identical up to the approximate degrees of freedom associated with the indi¬ 
vidual estimates, but the Wald method is used to construct confidence intervals for the 
person and machine by person interaction variance components from method of moments 
and the more appropriate Satterthwaite approximation is used with REML and MIVQUEO. 


TABLE 23.5 

Estimates of the Covariance Parameters from the Type III Analysis 
Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Standard Error 

Z-Value 

Pr Z 

a 

Lower 

Upper 

Person 

22.8584 

17.5825 

1.30 

0.1936 

0.05 

-11.6026 

57.3195 

Person x machine 

13.9095 

6.3587 

2.19 

0.0287 

0.05 

1.4465 

26.3724 

Residual 

0.9246 

0.2179 

4.24 

<0.0001 

0.05 

0.6115 

1.5601 


TABLE 23.6 


Estimates of the Covariance Parameters from the REML Analysis Including the Computed 
Degrees of Freedom Used in Computing the Confidence Intervals 


Covariance 

Parameter 

Estimate 

Standard 

Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

df 

Person 

22.8584 

17.5825 

1.30 

0.0968 

0.05 

7.6910 

251.49 

3.3804 

Person x machine 

13.9095 

6.3587 

2.19 

0.0144 

0.05 

6.7031 

44.2384 

9.5699 

Residual 

0.9246 

0.2179 

4.24 

<0.0001 

0.05 

0.6115 

1.5601 

36.0000 
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The methods necessary to estimate and test hypotheses about the fixed effects in a mixed 
model, depend on whether the data are balanced or unbalanced. The methods for the 
unbalanced design are discussed in Section 22.3. 

The general mixed model can be expressed as 


y — X)3 + Z k u k + Z 2 n 2 + • • • + Z^w^. £ 

where X = [/, XJ and /3 = (ju, T„ r 2 ,..., z 
If the elements of the covariance matrix. 


X= o\ZfZ[ + a\ Z 2 Z 2 + • • • + alZ k Z' k + ajl 

are known; that is, the variance components are known, the BLUE (best linear unbiased 
estimate) of an estimable function a'1 3 is 

fl7W=«'(X'£ 'X) XX'y 

For most balanced mixed models, the estimator of a'/J simplifies to 

a A 3 LUE = (XX) X y 

The next example shows this simplification for the balanced two-way mixed model. The 
model can be reparameterized as 


yyk = Hi + Pi + (W)ij + e itk 

where p t = p+ T, and X, is the nbtxt design matrix corresponding to the fixed effects part 
of the model p = (p v p 2 ,..., p.)' 

Note that 




jnb 0 0 

0 jnb 0 

0 0 jnb 


0 0 0 


0 

0 

0 


jn ®jb 


The covariance matrix for the balanced model is 

X= a 2 p (J n ®I b ® J t ) + a 2 nxp (J n ®I b ® I t ) + a](l n ®I b ® I t ) 
which can be expressed as 


2 = A, -/„« 

\ n 


) ~h | + ^2 


~Jn 




+ A, 


I„--J n ®I„®I, 

n 


where \ = ntar, + na 2 nx + o\,7t^ = na 2 nx + a |, and Z, = a\. It can be shown that 


Z 1 = 


1 (1 


\\n 


Jt + 


r-p, 


^n J n 

n 


)I b ®I t 
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Then the matrices X'E 1 X 1 and X’fL y simplify to 


and 




1 - 1 


X 'lZ = - r j' n ®K®-J t + -T-j’n ®jl ® h- -h 


K 


t” A, 


Next one can show that 


1 /1 \ T / 1 

(Wr'4 -j, +4 

nb\t y «oV t 


Thus the estimator of fi is 


/X = (x'^-Xr'x'.z-fj = f-y: ® Jyl ® i t \y = (xj^rxjy 

\n b 1 


Therefore the estimate of fi, is = y,.., i = 1,2,..., t. 
The variance of fi { is 


Var (/t,) = (X'ir 1 X l )- 1 = 
Thus the variance of fi is 


+ noLp + ntf (1 , ) + n °L P f T _1 T 

nb lt h r nb {* t h 


Var (fii) = 


+ na mx r + na l 


nb 


For the machine-person example, the variance of each machine mean is 

Var (fifi = 


a; + 3of„ xp + 3oy 


18 


The estimate of a contrast afi (where a'j, = 0) is a'/t with variance 


07 + 

Var(fl'u) =- 

nb 


The estimate the difference /x, - /u r (i i') is fi, - fi r with variance 

Var (fi, - fi r ) = 2 


+ na mx r 


nb 
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The estimate of the standard error of the difference between the two machine means is 


S.C.Hj-Hi' 


2(MSPerson x Machine) 


18 


2.177 


Table 23.7 contains the estimated machine means, estimated standard errors, f-values, 
and significance levels for testing the individual means are equal to zero. Pairwise 
differences among the machine means are in Table 23.8, which includes the estimated 
differences, estimated standard errors, f-values, and significance levels. An LSD 005 value 
for comparing the two means is computed as 

LSD 005 = (^o.o 25 ,io )( s ' e -A-A-) = 2.228(2.177) = 4.85 


All of the differences are greater than 4.85, so the LSD shows the machine means are all 
significantly different from one another. 

To test the hypothesis that the means are equal, H 0 : p 1 =ju 2 = p 3 vs H a : ( not H 0 ), use the 
expected mean squares from Table 23.2 to construct the test statistic 


, MSMachine 

MSPerson x Machine 


20.58 


The significance level corresponding to the equal means hypothesis is 0.0004. This 
example shows that the analysis is quite straightforward for the balanced case. However, 
the analysis of the unbalanced case is not quite as easy. 


TABLE 23.7 

Machine Means and Estimated Standard Errors for the Balanced Data Set 


Least Squares Means 






Effect 

Machine 

Estimate 

Standard Error 

df 

t- Value 

Pr> 1 1\ 

Machine 

1 

52.3556 

2.4858 

8.52 

21.06 

<0.0001 

Machine 

2 

60.3222 

2.4858 

8.52 

24.27 

<0.0001 

Machine 

3 

66.2722 

2.4858 

8.52 

26.66 

<0.0001 


TABLE 23.8 


Pairwise Difference among the Machine Means for the Balanced Data Set 


Differences of Least Squares Means 

Effect Machine Machine 

Estimate 

Standard Error 

df 

f-Value 

Pr> |t| 

Machine 

1 

2 

-7.9667 

2.1770 

10 

-3.66 

0.0044 

Machine 

1 

3 

-13.9167 

2.1770 

10 

-6.39 

<0.0001 

Machine 

2 

3 

-5.9500 

2.1770 

10 

-2.73 

0.0211 
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23.2 Unbalanced Two-Way Mixed Model 

The data in Table 23.1 for this example are the same as those for the balanced example 
except that some observations have been randomly deleted. This has been done to demon¬ 
strate the problems that occur when analyzing unbalanced data sets and to compare the 
estimation procedures for the balanced and unbalanced cases. The model to describe this 
data is identical to that for the balanced data set in Section 23.1, or 

y ijk ~~ M T ^ Pi ^ijkr 1 1, 2 , ... , t, j 1, 2 , . . . , l), k 1, 2 , ... , fljj 

The analysis of variance table for the unbalanced data is shown in Table 23.9. The sums of 
squares are those obtained by the method of fitting constants, or type I sum of squares; 
that is. 


SSMachine = R(x \ fi) 
SSPerson = R(p \ h,t) 


and 


SSPerson x Machine = R((rp) \ fd,T,p) 


Their corresponding expected mean squares are included in Table 23.9. 

First, consider estimating of In the unbalanced case the coefficients on cr- xp in the 
expected mean squares of the MSPerson and MSPerson x Machine are not the same. Thus in 


TABLE 23.9 


Analysis of Variance Table Based on Type I Sums of Squares for Unbalanced Data Set 


Type I Analysis of Variance 

Sum of 

Source df Squares 

Mean 

Square 

Expected Mean Square 

Error Term 

Error df F- Value 

Machine 

2 

1648.664722 

824.332361 

VarfResidual) + 

2.6115 Var (person x 
machine) + 0.1569 

Var (person) + Qpnachine) 

0.0217 MS(person) + 
1.1032 MS(person x 
machine) - 0.1249 
MS(Residuai) 

11.782 

16.86 

Person 

5 

1008.763583 

201.752717 

Var (Residual) + 2.5866 
Var(person x machine) + 
7.219 Var (person) 

1.1167 MS(person x 
machine) - 0.1167 
MS(Residuai) 

9.9549 

4.48 

Person x 
machine 

10 

404.315028 

40.431503 

Var (Residual) + 2.3162 

Var (person x machine) 

MS(Residual) 

26 

46.34 

Residual 

26 

22.686667 

0.872564 

Var(Residual) 

— 

— 

— 
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order to estimate of one will need to find a function of all three expected mean squares 
that is equal to of It can be shown that 


d l = 


7.219 


(d; + 2.5866<7 2 + 7.2190<r 2 ) - ^^(<J £ 2 + 2.3162<r 2 ) + 


2.3162 


2.5866 

2.3162 


- 1 <J 


J-[(d 2 + 2.5866 o£ x , + 7.2190d;) - 1.1167(<r 2 + 2.3162<7 2 xp ) + (0.1167)d 2 ] 


7.219 


y^[E(MSPerson) - 1.11 67E(MSPerson x MS Machine) + (0.116 7)E(MSResidual)] 


The method of moments equations for the type I sums of squares are 

MSPerson = a 2 + 2.5866<7 2 xp + 7.2190d 2 
MSPerson x Machine = a] + 2.3162<t 2 X() 


and 


MSResidual = o 2 e 

Solving the above equations simultaneously give the type I method of moments estimators 
of each of the variance components as 


cr 2 = 
r 7.219 


^^\_MSPerson - 1.1167 MSPerson x Machine + 0.116 7MSResidual~J 
—[201.7527 - 1.1167(40.4315) + 0.1167(0.8726)]= 21.7073 


7.219 

<7 2 xp = —[MSPerson x Machine - MSResidual ] 

= —-—[40.4315 - 0.8726]= 17.0792 
2.3162 1 J 


and 


a 2 e = MSResidual = 0.8726 

The estimates of the variance components obtained from the REML, ML, MIVQUE0, 
type I, type II, and type III methods are given in Tables 23.10-23.13. The solution for the 
residual variance component using the MIVQUE0 method is 0. (If one uses the no-bound 
option in SAS-Mixed, the solution is negative. If there is one variance component we should 
always be able to estimate, it is the residual.) The summary in Table 23.13 indicates that the 
REML method seems to be in the middle of the other methods' estimates. 

The SAS-Mixed code used to obtain these results is given in Table 23.14, where the 
Method = REML can be replaced by ML, MIVQUE0, typel, type2 or type3 to produce the 
other results. The tables contain the estimates of the variance components, estimated 
standard errors, recomputed degrees of freedom, confidence intervals, and recomputed 



Case Studies of a Mixed Model 


411 


TABLE 23.10 


Estimates of the Residual Variance Component for the Unbalanced Data Set 


Method 

Estimate 

Standard Error 

Z-Value 

df 

Lower 

Upper 

Newlower 

Newupper 

REML 

0.87 

0.24 

3.61 

26.10 

0.54 

1.63 

0.54 

1.63 

MIVQUEO 

0.00 

0.00 

— 

— 

— 

— 

— 

— 

ML 

0.87 

0.24 

3.62 

26.14 

0.54 

1.63 

0.54 

1.63 

Type III 

0.87 

0.24 

3.58 

25.66 

0.54 

1.64 

0.54 

1.65 

Type II 

0.87 

0.24 

3.59 

25.71 

0.54 

1.64 

0.54 

1.65 

Type I 

0.87 

0.24 

3.59 

25.71 

0.54 

1.64 

0.54 

1.65 

TABLE 23.11 








Estimates of the Person Variance Component for the Unbalanced Data Set 



Method 

Estimate 

Standard Error 

Z-Value 

df 

Lower 

Upper 

Newlower 

Newupper 

REML 

22.46 

17.41 

1.29 

3.33 

7.51 

254.64 

7.51 

254.64 

MIVQUEO 

24.34 

10.79 

2.26 

10.18 

11.95 

74.00 

11.95 

74.00 

ML 

18.70 

13.24 

1.41 

3.99 

6.71 

155.14 

6.71 

155.14 

Type III 

24.26 

21.27 

1.14 

2.60 

-17.44 

65.95 

7.34 

464.28 

Type II 

21.71 

17.81 

1.22 

2.97 

-13.20 

56.62 

6.94 

308.07 

Type I 

21.71 

17.81 

1.22 

2.97 

-13.20 

56.62 

6.94 

308.07 

TABLE 23.12 








Estimates of the Person x Machine Variance Component for the Unbalanced Data Set 


Method 

Estimate 

Standard Error 

Z-Value 

df 

Lower 

Upper 

Newlower 

Newupper 

REML 

14.23 

6.52 

2.18 

9.55 

6.85 

45.35 

6.85 

45.35 

MIVQUEO 

16.15 

8.53 

1.89 

7.17 

7.12 

65.44 

7.12 

65.44 

ML 

11.82 

4.96 

2.38 

11.35 

5.98 

33.37 

5.98 

33.37 

Type III 

17.08 

9.57 

1.78 

6.37 

-1.68 

35.84 

7.24 

77.75 

Type II 

17.08 

9.57 

1.78 

6.37 

-1.68 

35.84 

7.24 

77.76 

Type I 

17.08 

9.57 

1.78 

6.37 

-1.68 

35.84 

7.24 

77.76 

TABLE 23.13 








Summary of the Estimates of the Variance Components for the Unbalanced Data Set 


Covariance Parameter 

REML 

ML 

MIVQUEO 

Type I 

Type II 

Type III 

Person 


22.4551 

18.7009 

24.3389 

21.7069 

21.7069 

24.2571 

Person x machine 

14.2340 

11.8176 

16.1548 

17.0791 

17.0791 

17.0791 

Residual 


0.8709 

0.8701 


0.0000 

0.8726 

0.8726 

0.8726 

TABLE 23.14 








SAS-Mixed Code to Produce the REML Estimates and Analysis for the Unbalanced Data Set 

proc mixed method= 

=reml cl covtest data= 

=ex23unb; 




class person machine; 
model rating=machine/DDFM=KR; 
random person person*machine; 







LSMEANS 

MACHINE/ 

diff; 
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Satterthwaite confidence intervals. Recall that the confidence intervals for the variance 
components for REML, ML and MIVQUEO are based on the chi-square distribution using 
df= 2(Z-value) 2 . The lower and newlower and upper and newupper are identical for these 
methods. Intervals are provided for variance components (other than residual) using the 
Wald interval when Method is type I or type II or type III. The newlower and newupper 
are the recomputed confidence intervals, and would be the intervals of choice. 

The F-statistics in Table 23.9 can be used to test hypotheses about the variance components. 
The statistic to test H 0 : n 2 — 0 vs H a : c 2 > 0 is F = 4.48 with 5 numerator and 9.9549 denomina¬ 
tor degrees. The significance level is 0.0003. The statistic to test H 0 : a 2 xp = 0 vs H a : cr 2 xp > 0 is 
F = 46.34 with 10 numerator and 26 denominator degrees of freedom. The significance level 
is <0.0001. The likelihood ratio test can also be used to provide tests of the hypotheses about 
the variance components. The process involves obtaining ML estimates of the variance com¬ 
ponents for the full model and computing - 21og(full likelihood function), as is shown in 
Table 23.15. To test H Q : <7 2 xp = 0 vs H a : c 2 xp > 0, fit a model without person x machine and com¬ 
pute the -21og(reduced likelihood function). The likelihood ratio test statistic is computed as 
LR test = -2 log(reduced likelihood function) - [-2 log(full likelihood function)], which equals 
55.5665 and is asymptotically distributed as a chi-square distribution with 1 degree of free¬ 
dom. The significance level for this test is <0.0001. To test H 0 : cr 2 = 0 vs H a : c 2 > 0, fit a model 
without person and compute -21og(reduced likelihood function). The likelihood ratio test 
statistic is computed as LR test = -21og(reduced likelihood function) - [-21og(full likelihood 
function)] which equals 6.4226 and is asymptotically distributed as a chi-square distribution 
with 1 degree of freedom. The significance level for this test is 0.0113. 

The estimates of the fixed effects parameters are computed by 

$ w = (X'i ] X) X’i \y 

where X, is the estimated covariance matrix evaluated at the estimates of the variance 
components from the specified method of estimating the variance components, and the 
variance-covariance matrix of 0 is given by Va r(p) = (X'X X) . The test for the equality of 
the machine means was obtained for each method of estimating the variance components. 
The type III tests for fixed effects are summarized in Table 23.16. The MIVQUEO results are 
not useful. The number of denominator degrees of freedom from REML is 10.1, which is 
close to the 10 one would use for the balanced data set. The ML method has too many 
denominator degrees of freedom and the type I—III methods have too few. 

Tables 23.17-23.19 contain the estimates of the machines' means. There is not much 
difference between the REML and the type I, II, or III results. Tables 23.20-23.22 contain the 


TABLE 23.15 


ML Estimates of the Variance Components for Three Models for Constructing Likelihood 
Ratio Tests for Hypotheses about cr 2 and cr 2 


Type 


Covariance Parameter 

Estimate 

-2 log(LH) 

LR test 

Pr chi 

Full 


Person 

18.7009 

191.506 



Full 


Person x machine 

11.8176 




Full 


Residual 

0.8701 




No machine x 

person 

Person 

20.1456 

247.073 

55.5665 

<0.0001 

No machine x 

person 

Residual 

11.2266 

Test for H 0 : 

<rLp = 0vs H„ 

<7r„x p > 0 

No person 


Person x machine 

30.5079 

197.929 

6.4226 

0.0113 

No person 


Residual 

0.8719 

Test for H 0 : <7 2 = 0 vs H :l 

,:ct 2 >0 
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TABLE 23.16 

Summary of the Type III Tests for Fixed Effects for the Unbalanced Data Set 


Method 

Effect 

Num df 

Den df 

F-Value 

PrF 

REML 

Machine 

2 

10.1 

19.96 

0.0003 

MIVQUEO 

Machine 

2 

6.58 

2.4 x 10 14 

<0.0001 

ML 

Machine 

2 

12.2 

23.91 

<0.0001 

Type III 

Machine 

2 

6.7 

16.72 

0.0025 

Type II 

Machine 

2 

6.7 

16.72 

0.0025 

Type I 

Machine 

2 

6.7 

16.72 

0.0025 


TABLE 23.17 

Estimates of the Mean for Machine 1 

Method 

Estimate 

Standard Error 

df 

f-Value 

Pr f 

REML 

52.35 

2.49 

S.72 

21.02 

<0.0001 

MIVQUEO 

46.80 

0.00 

6.58 

I 

<0.0001 

ML 

52.35 

2.27 

10.5 

23.01 

<0.0001 

Type III 

52.35 

2.64 

7.22 

19.82 

<0.0001 

Type II 

52.35 

2.56 

8.88 

20.45 

<0.0001 

Type I 

52.35 

2.56 

8.88 

20.45 

<0.0001 


TABLE 23.18 

Estimates of the Mean for Machine 2 

Method 

Estimate 

Standard Error 

df 

f-Value 

Pr f 

REML 

60.32 

2.49 

8.68 

24.25 

<0.0001 

MIVQUEO 

43.63 

0.00 

6.58 

I 

<0.0001 

ML 

60.31 

2.27 

10.4 

26.55 

<0.0001 

Type III 

60.32 

2.64 

7.18 

22.86 

<0.0001 

Type II 

60.32 

2.56 

8.83 

23.59 

<0.0001 

Type I 

60.32 

2.56 

8.83 

23.59 

<0.0001 


TABLE 23.19 

Estimates of the Mean for Machine 3 

Method 

Estimate 

Standard Error 

df 

f-Value 

Pr f 

REML 

66.27 

2.48 

8.61 

26.69 

<0.0001 

MIVQUEO 

61.30 

0.00 

6.58 

I 

<0.0001 

ML 

66.27 

2.27 

10.3 

29.25 

<0.0001 

Type III 

66.27 

2.63 

7.13 

25.16 

<0.0001 

Type II 

66.27 

2.55 

8.77 

25.97 

<0.0001 

Type I 

66.27 

2.55 

8.77 

25.97 

<0.0001 
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TABLE 23.20 

Estimates of the Difference between the Means for Machines 1 and 2 


Method 

Estimate 

Standard Error 

df 

t-Value 

Pr t 

REML 

-7.96 

2.21 

10.2 

-3.60 

0.0047 

MIVQUE0 

3.17 

0.00 

6.58 

I 

<0.0001 

ML 

-7.96 

2.02 

12.3 

-3.93 

0.0019 

Type III 

-7.97 

2.42 

6.75 

-3.29 

0.0140 

Type II 

-7.97 

2.42 

6.75 

-3.29 

0.0140 

Type I 

-7.97 

2.42 

6.75 

-3.29 

0.0140 


TABLE 23.21 

Estimates of the Difference between the Means for Machines 1 and 3 


Method 

Estimate 

Standard Error 

df 

t -Value 

Pr t 

REML 

-13.92 

2.21 

10.1 

-6.30 

<0.0001 

MIVQUE0 

-14.50 

0.00 

6.58 

M 

<0.0001 

ML 

-13.92 

2.02 

12.1 

-6.89 

<0.0001 

Type III 

-13.92 

2.41 

6.7 

-5.76 

0.0008 

Type II 

-13.92 

2.41 

6.7 

-5.76 

0.0008 

Type I 

-13.92 

2.41 

6.7 

-5.76 

0.0008 


TABLE 23.22 

Estimates of the Difference between the Means for Machines 2 and 3 


Method 

Estimate Standard Error df 

t-Value 

Pr t 

REML 

-5.96 

2.21 

10 

-2.70 

0.0222 

MIVQUE0 

-17.67 

0.00 

6.58 

M 

<0.0001 

ML 

-5.96 

2.01 

12.1 

-2.96 

0.0119 

Type III 

-5.95 

2.41 

6.66 

-2.47 

0.0446 

Type II 

-5.95 

2.41 

6.66 

-2.47 

0.0446 

Type I 

-5.95 

2.41 

6.66 

-2.47 

0.0446 

TABLE 23.23 






Type III Tests of Equal Machine Means for the Unbalanced Data Set 

Type III Tests of Fixed Effects 





Effect 

Num df 


Den df 

F-Value 

Pr > F 

Machine 

2 


10.1 

19.96 

0.0003 


estimates of the differences between each pair of machine means for each of the methods 
of estimating the variance components. There is a difference between the results for the 
methods as the denominator degrees of freedom for REML are approximately equal to 10, 
as expected if the data set was balanced. The REML method seems to provide the best all- 
around results. 

The test of the hypothesis that the means are equal, H 0 : q, = q 2 = q 3 vs H a : (not H a ) based on 
REML estimates is displayed in Table 23.23, where the computed F-value is 19.96 based on 2 
numerator degrees of freedom and 10.1 denominator degrees of freedom. The denominator 
degrees of freedom from the balanced data set analysis is 10 (which corresponds to the 
degrees of freedom for the machine by person interaction), so 10.1 is a close approximation. 
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The results in Tables 23.20-23.22 are the pairwise differences between each pair of 
machine means. Again, the results based on the REML method of estimating the variance 
components seem to be the best as the degrees of freedom for the pairwise comparisons are 
10.2,10.1 and 10, values that are close to 10, as was the case for the balanced analysis. 


23.3 JMP Analysis of the Unbalanced Two-Way Data Set 

The analysis of the unbalanced data set using JMP® involves constructing a data table as 
shown in Figure 23.1. The variables machine, person and rep are declared to be nominal 
and rating is continuous. The fit model screen is displayed in Figure 23.2, where the rating 
has been selected as the "Y" variable, machine is a fixed effect, and person and machine by 
person are random effects. The REML method was selected as the method to estimate 
the variance components. The REML estimates of the variance components are shown in 
Figure 23.3 along with the estimated standard errors and Wald confidence intervals. The 
test of the hypothesis that the means are equal, H 0 : q, = jd 2 = lb vs (not HJ is displayed 
in Figure 23.4, where the computed F-value is 19.9639 based on 2 numerator degrees of 
freedom and 10.11 denominator degrees of freedom. 

The least squares means and their estimated standard errors are shown in Figure 23.5. 
An LSD multiple comparison method was used to do pairwise comparisons among the 
means and the results which include the differences, their estimated standard errors and 
95% confidence intervals, are shown in Figure 23.6. The results of the estimation process 
using JMP provide the same results as those from SAS-Mixed using the REML method, 
except that the confidence intervals are constructed using the Wald method instead of the 
using a Satterthwaite approximation. 


[ft ex23unb 


>r ex23unb 

< 

machine 

person 

rating 

rep 



1 

1 

1 

52 

1 


2 

1 

1 

■ 

2 


3 

1 

1 

■ 

3 


4 

1 

2 

51.8 

1 


5 

1 

2 

52.8 

2 


6 

1 

2 

■ 

3 


r Columns (4/0) 

7 

1 

3 

60 

1 


||, machine 
di person 

A rating 
iL rep 

8 

1 

3 

• 

2 


9 

1 

3 

■ 

3 


10 

1 

4 

51.1 

1 


11 

1 

4 

52.3 

2 


12 

1 

4 

• 

3 



FIGURE 23.1 JMP data table screen. 
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S Fit Model - |j □ |[ X 


w Model Specification 


Select Columns 

||> machine 
ll, person 
grating 
lk rep 


Pick Role Variables 

Y 

grating 

optional 

| Weight | 

| optional Numeric 

I Freq ) 

| optional Numeric 

By 

1 optional 


Personality: standard Least Squares v 

Emphasis: Effect Leverage _v. 

Method: REML (Recommended) v 

0 Unbounded Variance Components 
I I Estimate Only Variance Components 



Construct Model Effects 



FIGURE 23.2 JMP fit model screen with response variable and model effects. 


▼ REML Variance Component Estimates 

Random Effect 

Var Ratio 

Var Component 

Std Error 

95% Lower 

95% Upper 

Pet of Total 

person 

25.785476 

22.455765 

17.414105 

-11.67588 

56.58741 

59.785 

machine*person 

16.344588 

14.233991 

6.5151618 

1.4642735 

27.003708 

37.896 

Residual 


0.8708687 

0.2410664 

0.5405379 

1.6332462 

2.319 

Total 


37.560625 




100.000 


FIGURE 23.3 REML estimates of the variance components from JMR 


▼ Fixed Effect Tests 

Source Nparm DF DFDen F Ratio Prob > F 

machine 2 2 10.11 19.9639 0.0003* 


FIGURE 23.4 Tests of the machine effects from JMR 
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^ Least Squares Means Table 

Level 

Least Sq Mean 

Std Error 

1 

52.354000 

2.4906713 

2 

60.316445 

2.4874448 

3 

66.272222 

2.4826077 


FIGURE 23.5 Machine least squares means and estimated standard errors. 


▼j T LSMeans Differences Student's t 

a= 0.050 


LSMean[j] 


Mean(i]-Mean[j] 

Std Err Dif 

Lower CL Dif 
Upper CL Dif 

1 

2 

3 

1 

0 

-7.9624 

-13.918 


0 

2.21482 

2 20942 


0 

-12.884 

-18.834 


0 

-3.0405 

-9.0021 

2 

7.96245 

0 

-5.9558 


2.21482 

0 

2.20578 


3.04047 

0 

-10.868 


12.8844 

0 

-1 0435 

3 

13.9182 

5.95578 

0 


2.20942 

2.20578 

0 


9.00211 

1.04348 

0 


18.8343 

10.8681 

0 


Level Least Sq Mean 

3 A 66.272222 

2 B 60.316445 

1 C 52.354000 

Levels not connected by same letter are significantly different. 


FIGURE 23.6 LSD multiple comparisons of machine means. 


23.4 Concluding Remarks 

The REML, MIVQUE0, and method of moments methods of estimating the variance com¬ 
ponents provide identical results for balanced data sets as long as the solutions for the 
variance components are positive. The methods also provide identical results for the anal¬ 
ysis of the fixed effects. The REML method seems to provide the best results for the analy¬ 
sis of the fixed effects when the data set is unbalanced. The main indicator is that the error 
degrees of freedom for comparing the machine means using the REML estimates of the 
variance components were similar to those for the balanced data set. The results from JMP 
using the REML option provided the same estimates of the variance components and of 
the fixed effects as obtained from SAS-Mixed using the REML method. Unfortunately, the 
confidence intervals constructed about the variance components use the Wald method, 
which is only appropriate when there are a large number of degrees of freedom associated 
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with each of the estimates of the variance components. For small-size data sets, the chi- 
square confidence intervals using the Satterthwaite approximation to the degrees of free¬ 
dom are superior to those provided by the Wald method. 


23.5 Exercises 

23.1 Compute the likelihood ratio tests to test H Q : of = 0 vs H a : of > 0 and H 0 : of = 0 vs 
H a : of > 0 for the balanced data set in Table 23.1. 

23.2 The data in the following table are from a study where five states were selected 
at random and three to five cities were randomly selected from the cities within 
a selected state. Within each city, six stores were selected, two of the stores were 
randomly selected from all of the locally owned stores in the city, two of the 
stores were randomly selected from all of the convenience stores in the city, and 
two of the stores were randomly selected from all of the chain stores in the city. 
The price of one pound of coffee was determined for each store. 

1) What type of model should be used to describe this data? What are the 
assumptions? 

2) Carry out a random effects analysis by estimating the variance components, 
testing the respective variance components are equal to zero, and construct 
95% confidence intervals about the variance components. 

3) Carry out the fixed effects analysis by testing the appropriate hypothesis, 
estimating the means, constructing a 95% confidence intervals about the 
differences between the store means, and carry out the appropriate multiple 
comparisons of the fixed effects means using the Tukey method. 


State 

City 

Loci 

Loc2 

Convl 

Conv2 

Chainl 

Chain2 

1 

1 

2.87 

2.60 

2.55 

2.79 

2.51 

2.72 

1 

2 

2.40 

2.55 

2.53 

2.32 

2.23 

2.46 

1 

3 

2.65 

2.60 

2.60 

3.10 

2.65 

2.54 

1 

4 

2.75 

2.82 

3.03 

2.95 

2.75 

3.01 

1 

5 

2.47 

2.42 

2.54 

2.71 

2.52 

2.52 

2 

1 

2.60 

2.79 

2.63 

2.87 

2.70 

2.60 

2 

2 

2.30 

2.18 

2.18 

2.28 

2.39 

2.51 

2 

3 

2.14 

2.25 

2.26 

2.35 

2.33 

2.17 

3 

1 

2.37 

2.37 

2.44 

2.37 

2.31 

2.31 

3 

2 

2.34 

2.38 

2.33 

2.46 

2.38 

2.30 

3 

3 

2.31 

2.19 

2.42 

2.21 

2.10 

2.13 

3 

4 

2.49 

2.31 

2.45 

2.34 

2.39 

2.33 

4 

1 

2.85 

2.41 

2.53 

2.49 

2.79 

2.57 

4 

2 

2.65 

2.48 

2.72 

2.83 

2.75 

2.59 

4 

3 

2.62 

2.61 

2.56 

2.84 

2.85 

2.49 

4 

4 

2.52 

2.62 

2.79 

2.61 

2.40 

2.54 

4 

5 

2.45 

2.33 

2.25 

2.39 

2.36 

2.35 

5 

1 

2.69 

2.48 

2.44 

2.81 

2.35 

2.41 

5 

2 

2.52 

2.55 

2.33 

2.53 

2.65 

2.36 

5 

3 

2.50 

2.13 

2.49 

2.37 

2.25 

2.14 
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23.3 The data in the following table are from the data set for Exercise 23.2 where 
some of the observations have been set to missing. Carry out the analysis of this 
price of one pound of coffee data by answering the following questions. 

1) What type of model should be used to describe this data? What are the 
assumptions? 

2) Carry out a random effects analysis by estimating the variance components, 
testing the respective variance components are equal to zero, and construct 
95% confidence intervals about the variance components. 

3) Carry out the fixed effects analysis by testing the appropriate hypothesis, 
estimating the means, constructing a 95% confidence intervals about the 
differences between the store means, and carry out the appropriate multiple 
comparisons of the fixed effects means using the Tukey method. 


State 

City 

Loci 

Loc2 

Convl 

Conv2 

Chainl 

Chain2 

1 

1 

2.87 

2.60 

— 

2.79 

— 

— 

1 

2 

2.40 

2.55 

2.53 

2.32 

2.23 

2.46 

1 

3 

— 

2.60 

2.60 

— 

2.65 

2.54 

1 

4 

2.75 

2.82 

3.03 

2.95 

— 

3.01 

1 

5 

2.47 

— 

2.54 

— 

2.52 

2.52 

2 

1 

— 

2.79 

2.63 

2.87 

2.70 

— 

2 

2 

2.30 

— 

2.18 

— 

— 

2.51 

2 

3 

2.14 

— 

2.26 

— 

— 

— 

3 

1 

2.37 

2.37 

2.44 

2.37 

2.31 

2.31 

3 

2 

2.34 

2.38 

2.33 

2.46 

— 

2.30 

3 

3 

— 

— 

2.42 

— 

2.10 

2.13 

3 

4 

2.49 

2.31 

— 

— 

— 

2.33 

4 

1 

2.85 

2.41 

2.53 

— 

— 

2.57 

4 

2 

2.65 

— 

— 

— 

2.75 

2.59 

4 

3 

— 

— 

2.56 

2.84 

2.85 

— 

4 

4 

2.52 

2.62 

— 

2.61 

2.40 

2.54 

4 

5 

— 

2.33 

2.25 

2.39 


— 

5 

1 

— 

2.48 

— 

— 

2.35 

2.41 

5 

2 

2.52 

2.55 

2.33 

2.53 

2.65 

2.36 

5 

3 

2.50 

— 

2.49 

2.37 

2.25 

— 


23.4 Carry out an analysis for the data in Table 12.1 assuming that the flour blocking 
factor is a random effect. 

23.5 Carry out an analysis for the data in Table 15.1 assuming that the flour blocking 
factor is a random effect. 

23.6 Carry out an analysis of the data in Exercise 15.1 assuming that gyms are a 
random effect. 





Methods for Analyzing Split-Plot Type Designs 


24.1 Introduction 

The split-plot type design involves a design structure with more than one size of 
experimental unit where the smaller-size experimental units are nested within the larger- 
size experimental units. Some examples of split-plot design structures were presented in 
Chapter 5, including the class of hierarchal design structures. Two main problems occur in 
the design and analysis of split-plot type design structures. The first problem consists of 
the selection and/or identification of the different sizes of experimental units used in the 
design structure followed by the assignment of treatments from the treatment structure to 
the different experimental unit sizes in the design structure. The successful identification 
of the different sizes of experimental units is paramount in the specification of an appro¬ 
priate model that can describe the resulting data. The second problem is constructing the 
appropriate model that describes the pertinent features of the treatment and design struc¬ 
tures. It is important to be able to identify the sources of variation that measure the 
variability associated with each size of experimental unit. These sources of variability are 
used to compute the respective error terms which are used to compute estimates of 
the standard errors of estimated means and for pairwise comparisons among means. Since 
these design structures involve more than one size of experimental units, the estimates of 
the standard errors of the fixed effect parameters and their comparisons among them 
involve one or more sources of variation. A very important characteristic of the model for 
the split-plot type designs is that they are the basic model for starting the construction 
of the repeated measures models discussed in Chapter 26. Examples of several of the 
concepts were presented in Chapter 5. 

The design and analysis of a split-plot or hierarchal design structures with two sizes of 
experimental units are explained in Section 24.1 and the determination and estimation of 
standard errors associated with the fixed effects are described in Section 24.2. A general 
method for determining the appropriate standard errors and their estimates for fixed 
effects parameter estimates in a general split-plot design structure is discussed in Section 

24.3. The computations of standard errors of contrasts of means are discussed in Section 

24.4. Four examples of split-plot design structures are presented in Section 24.6, where each 
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example demonstrates some salient features of analyzing such designs. A discussion of 
sample size determination and power computations for split-plot design structures is 
given in Section 24.7. Analyses using both SAS®-Mixed and JMP® are given in this chapter 
with the JMP analyses shown in Section 24.8. 

The key concepts in constructing models for split-plot designs are recognizing the 
different sizes of experimental units and then identifying the corresponding design struc¬ 
tures and treatment structures. The overall model is constructed by incorporating models 
developed for each size of experimental unit. Several examples of model construction were 
presented in Chapter 5, but the assumptions underlying the models were not stated. The 
assumptions are that the components denoting the error terms for the various experi¬ 
mental units are all distributed independently with zero means and an associated variance 
(see Chapter 26 for assumptions that are more general). Under ideal conditions, the error 
terms are normally distributed. The objective of an analysis is to use the model assump¬ 
tions to obtain estimates of the population parameters and to make inferences about them. 
Both method of moments and REML are used in the following examples to demonstrate 
the computations needed to estimate the standard errors of the fixed effects. In practice, 
REML is the method that can be recommended in most cases. 

24.1.1 Example 24.1: Bread Recipes and Baking Temperatures 

The process of baking bread involves mixing a batch of bread dough according to the 
specifications of a recipe, putting the dough into a pan (container), letting the bread rise, 
and then putting the pan of bread dough into an oven to be baked at a specific temperature 
and time combination. Each oven is large enough so that four pans of bread dough can be 
put into one oven at the same time. An experiment was designed to evaluate how four dif¬ 
ferent recipes of bread respond to baking at three different temperatures where the 
response measured is the volume of the resulting loaf of bread. The process is to make 
dough from each of the four recipes and place one loaf from each recipe into a single oven 
that is set to a specific temperature. The batches are left in the oven for a specified length 
of time and then the loaves are cooled to room temperature before measuring their vol¬ 
umes. The data in Table 24.1 are the volumes of loves of bread made from the four recipes 
and three temperatures where the process was repeated on three different days. Days are 
considered as a blocking factor in the design structure. 


TABLE 24.1 

Loaf Volumes (cm 3 ) for Various Recipes and Temperatures for Example 24.1 


Day 

Temperature 

Recipe A 

Recipe B 

Recipe C 

Recipe D 

1 

325 

1143 

1148 

1181 

1165 

1 

340 

1420 

1425 

1340 

1404 

1 

355 

1222 

1166 

1231 

1274 

2 

325 

1225 

1208 

1177 

1193 

2 

340 

1447 

1402 

1353 

1414 

2 

355 

1209 

1293 

1322 

1285 

3 

325 

1133 

1115 

1122 

1142 

3 

340 

1298 

1261 

1190 

1321 

3 

355 

1179 

1175 

1236 

1257 
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Temperature 



♦-♦ ♦-♦ ♦-♦ 

Day 1 Day 2 Day 3 


FIGURE 24.1 Schematic showing the randomization of temperatures to ovens within each day. 


The diagram in Figure 24.1 demonstrates the process of assigning temperatures to the 
ovens, thus the ovens are the experimental units for levels of temperatures (notice that the 
recipes are not included in this step of the process). The design associated with the oven 
size experimental unit is a one-way treatment structure (three levels of temperature) in a 
randomized complete block design structure (three days). A model that could be used to 
describe the mean loaf volume of the four loaves within each oven (i.e. one observation 
per oven) is 


V°ik ~ + d k + °°ik 

where y° ik denotes the observed mean loaf volume, ,u f denotes the mean loaf volume from 
the zth level of temperature, d° k denotes the random effect of the /cth day, and o° jk denotes the 
random oven effect from the zth temperature and the fcth day. It is assumed that d" k ~ i.i.d. 
N( 0, o 2 f), 0 % ~ i.i.d. N( 0, erf,), and all of the d° k and o'), are independent. The analysis of vari¬ 
ance associated with the oven model is displayed in Table 24.2, where the oven error term 
is computed as the temperature by day interaction mean square. 


TABLE 24.2 

Analysis of Variance Table for the Oven Level Analysis 


Source 

df 

Day 

2 

Temperature 

2 

Error(oven) = Day x Temperature 

4 
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Temperature 



Recipe 


FIGURE 24.2 Schematic showing the randomization of recipes to positions within one oven within each day. 


The diagram in Figure 24.2 shows the assignment of a recipe to a position within each 
oven. The oven within a day is a block for the four recipes. Each recipe provides one loaf of 
bread. The loaf design is a one-way treatment structure (four recipes) in a randomized com¬ 
plete block design structure with nine blocks (three blocks or ovens on each of three days). 

If all ovens were at the same temperature, then the data structure would be a one-way 
treatment structure in a randomized complete block design structure. But not all ovens are 
treated alike, three are set at 325°F, three are set at 340°F, and three are set at 355°F. Next 
simplify the design by considering all data from 325°F, as shown in Figure 24.3. The 
resulting data are from a one-way treatment structure in a randomized complete block 
design structure with three blocks. The error term for this data is the recipe by day inter¬ 
action. A model that can be used to describe the data from the 325°F ovens is 

yj k =^ +o t +£ Jk 

where yj k denotes the observed volume of the loaf in the 325°F oven of the/th recipe on the 
kth day, pf denotes the mean loaf volume from the/th recipe, of denotes the random effect 
of the oven used on the kth day, and ef denotes the random loaf effect from the jth recipe 
and the kth day. It is assumed that of ~ i.i.d. N( 0, of), ef k ~ i.i.d. N( 0, of), and all of and ef k are 
independently distributed. The analysis of variance associated with the loaf model is dis¬ 
played in Table 24.3, where the loaf error term is estimated by the recipe by day interaction 
mean square. 

The error term that measures the loaf to loaf variability is computed by pooling the 
recipe by day interaction across the three temperatures, obtaining ErrorQoaf) = Recipe x 
Day (Temperature). Putting the two models together provides the model 


yijk = Rij + d k + o ik + £ ijk , i = 1, 2,3, j = 1, 2,3,4, k= 1, 2, 3 
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where = p + T, + R + (TR)^, T, denotes the effect of the ith temperature, R denotes the /th 
recipe effect, (TR), ( denotes the temperature by recipe interaction, d k denotes the /cth day, o ik 
denotes the ith oven effect on the kth day, and e ijk represents the error term. Under the ideal 
conditions d k ~ i.i.d. N( 0, cr^ ay ), o ik ~ i.i.d. N( 0, of cn ), e ijk ~ i.i.d. N( 0, crf oaf ), and all d k , o ik , and e ijk 
are independently distributed. The oven is the whole plot or larger-sized experimental 
unit and the loaf is the subplot or split-plot or smaller sized experimental unit. 

The model can be expressed by size of experimental unit as 

Vijk = Pij + d k + T, + o ik } whole-plot or oven part of the model 
+ R + ( TR),j + £, jk } subplot or loaf part of the model 

Combining the analyses of variance tables in Tables 24.2 and 24.3 gives the analysis of 
variance table for this model in Table 24.4. The expected mean squares dictate the appro¬ 
priate denominator for computing test statistics for the fixed effects. The Error(oven) is 
used as the error to test for temperature main effects and the Error(loaf) is used as the 
error to test for the recipe main effect and the temperature by recipe interaction effects. 
The SAS-Mixed code and resulting analysis of variance table for the loaf volume data are 


TABLE 24.3 


Analysis of Variance Table for the Loaf 
Volume Data at 325°F 


Source 

df 

Day 

2 

Recipe 

3 

Error(loaf) = Day x Recipe 

6 
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TABLE 24.4 


Analysis of Variance Table for the Loaf Volume Data of Example 24.1 


Source 

df 

Expected Mean Square 

Day 

2 

< af + 4 ^ven + 12<Fj ay 

Temperature (T) 

2 

<T Lf + 4<T oven + <t> 2 (T) 

Error(oven) 

4 

°ioaf + 4<T oven 

Recipe (R) 

3 

Oioaf + <t> 2 (R) 

TxR 

6 

°Lf +<t> 2 (TxR) 

Error(loaf) 

18 



TABLE 24.5 

Analysis of Variance Table and SAS-Mixed Code for the Loaf Volume Data of Example 24.1 


proc mixed data=bread method=type3; 
class day temp recipe; 

model volume=temp|recipe/ddfm=kr;random day day*temp; 
lsmeans temp|recipe/diff; 



Source 

df 

SS 

MS 

EMS 

F-Value 

PrF 

Temperature 

2 228756.6 

114378.3 

Var (Residual) + 4 Var(day x temperature) + 
Q(temp,temp x recipe) 

27.92 

0.0045 

Recipe 

3 

6041.4 

2013.8 

Yar{Residuat) + Q(recipe,temp x recipe) 

3.06 

0.0547 

Temperature x recipe 

6 

21790.3 

3631.7 

Yar{Residuat) + Q(temp x recipe) 

5.52 

0.0021 

Day 

2 

51777.7 

25888.8 

Var( Residual) + 4 Varlday x temp) + 

12 Var (day) 

6.32 

0.0578 

Day x temperature 

4 

16385.7 

4096.4 

Var (Residual) + 4 Var(day x temp) 

6.23 

0.0025 

Residual 

18 

11841.6 

657.9 

Var(Residual) 

— 

— 


in Table 24.5. There is indication of a significant temperature by recipe interaction 
(p = 0.0021), thus further comparisons among the temperature x recipe two-way means 
should follow (see Section 24.2). 


24.1.2 Example 24.2: Wheat Varieties Grown in Different Fertility Regimes 

The data in Figure 24.4 are the yields in pounds of two varieties of wheat ( B 1 and B 2 ) grown 
in four different fertility regimes (A,, A 2 , A 3 , and A 4 ). The field was divided into two blocks, 
each with four whole plots. Each of the four fertilizer levels was randomly assigned to one 
whole plot within each block. Thus, the whole plot design consists of a one-way treatment 
structure (four levels of fertilizer) in a randomized complete block design structure with 
two blocks. Each block contains four whole plot experimental units which were split into 
two parts (called subplots). Each variety of wheat was randomly assigned to one subplot 
within each whole plot. The subplot design consists of a one-way treatment structure 
(two varieties) in a randomized complete block design with eight blocks where each block 
contains two subplot experimental units. A model that can be used to describe this data is 


yijk = Hij+K + Wik + £iji<, 


i= 1,2, 3,4, ; = 1,2, * = 1,2 
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Block 1 Block 2 

Variety Variety 


Fertility 

regime 



Vi 

v 2 

Ai 

35.4 

37.9 

A 2 

36.7 

38.2 

A 3 

34.8 

36.4 

a 4 

39.5 ^ 

40.0 




Subplot 


Whole-plot 


Fertility 

regime 



FIGURE 24.4 Data for the variety by fertility regime split-plot example. 


where p,j denotes the expected response (yield) for the z'th fertility level and /th variety, and 
y ijk denotes the observed yield (response) from the /cth block with the z'th fertility level and 
;'th variety, b k denotes the block effect which is assumed to be distributed iid N( 0, (7f, k)ck ), w ik 
denotes the whole plot error which is assumed to be distributed Lid. N( 0, (jf ,) and e jjk 
denotes the subplot error which is assumed to be distributed i.i.d. N( 0, cf). It is also 
assumed that all of the b k , w ik , and e ijk are distributed independently The mean response 
can be expressed using an effects model representation as p u = p+ F, + V f + (FV) ir The anal¬ 
ysis of variance table for this effects model is in Table 24.6. The denominators for the F-tests 
for the three fixed effect comparisons are determined by the expected mean squares; that 
is, the Error(zvhole plot) is used to test for fertility main effects and the Error(subplot) is used 
to test for variety main effects and fertility by variety interaction effects. The SAS-Mixed 
code and the numerical results using type III sums of squares are given in Table 24.7, and 
results using the REML option are given in Table 24.8. The F-tests for the fixed effects are 
identical for these two analyses since the data set is balanced and both estimates of the 
variance components are greater than zero. 


TABLE 24.6 

Analysis of Variance Table for the Wheat Yield Data 
of Example 24.2 


Source 

df 

Expected Mean Square 

Block 

1 

^ + 2 < + 8^ lock 

Fertility (F) 

3 

ol + 2ol v + f-(F) 

Error(whole plot) 

3 

a] + 2cr 2 UV 

Variety (V) 

1 

<7 £ +0 2 (V) 

FxV 

3 

c\ + <p 2 (F x V) 

Error(subplot) 

4 

a] 



428 


Analysis of Messy Data Volume 1: Designed Experiments 


TABLE 24.7 


Analysis of Variance Table Using Type III Sums of Squares with the Wheat Yield Data 
of Example 24.2 


proc mixed data=ex24 1 method=type3; 
class block a b; 
model y = a | b/ddfm=kr ; 
random block a*block; 
lsmeans a|b/diff; 

Source 

df 

SS 

MS 

EMS 

F-Value 

Prf 

a 

3 

40.2 

13.4 

Var(Residual) + 2 Var(f lock x a) + Q(a,a x b ) 

5.80 

0.0914 

b 

1 

2.3 

2.3 

Wav{Residual) + Q(b,a x b) 

1.07 

0.3599 

axb 

3 

1.6 

0.5 

Var(Residml) + Q(n x b) 

0.25 

0.8612 

Block 

1 

131.1 

131.1 

Var (Residual) + 2 Var (block x a) + 8 War(block) 

56.77 

0.0048 

Block x a 

3 

6.9 

2.3 

Var(Residual) + 2 Var(Wodc x a) 

1.10 

0.4476 

Residual 

4 

8.4 

2.1 

Yar(Residnal) 

— 

— 


TABLE 24.8 

Analysis of Variance Table Using REML with the 
Wheat Yield Data of Example 24.2 

p r 0 C mixed data=ex24_l; 
class block a b; 
model y=a|b/ddfm=kr; 
random block a*block; 
lsmeans a|b/ diff; 


Covariance Parameter Estimates 

Covariance Parameter 


Estimate 

Block 




16.0992 

Block x a 




0.1008 

Residual 




2.1075 

Type III Tests of Fixed Effects 



Effect 

Num df 

Den df 

F-Value 

Pr>F 

a 

3 

3 

5.80 

0.0914 

b 

1 

4 

1.07 

0.3599 

axb 

3 

4 

0.25 

0.8612 


24.2 Model Definition and Parameter Estimation 

The general model for the split-plot design with a randomized complete block whole plot 
design structure with r blocks, the whole-plot factor (A) with a levels and the subplot factor 
(C) with c levels is 


Vijk = b+ «, + K + + Yj + ( a Y),j + e ljk , 


i = 1,2 ,j = 1,2,...,c, k= 1,2,...,r 
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where y l]k is the observed response, b k denotes the /cth block effect which is assumed to be 
distributed N(0, eg), w ik denotes the whole plot error which is assumed to be distributed 
N( 0, e 2 ), and e ijk denotes the subplot error and is assumed to be distributed N( 0, of). It is 
also assumed that all of the b k , w ik , and e ijk are distributed independently. It should be noted 
that the most important assumption is that the all of the b k , w ik , and e ijk are distributed 
independently. Fortunately, this assumption can be guaranteed by the randomization 
process as the fixed effect factors are randomly assigned to their appropriate sized experi¬ 
mental units. The fixed effects in this model are the overall mean, p, the effect of the whole 
plot factor (A), a,, the effect of the subplot factor (C), y, and the effect of the interaction 
between the levels of the whole plot factor and the levels of the subplot factor, (ay),-,. The 
means model can be represented in terms of the effects model as /I,, = ,li + a, + y + (ay),,. 
The analysis of variance table for this general model with sources of variability, degrees of 
freedom and expected mean squares is in Table 24.9. 

The whole plot error is computed as the Block x A interaction and the subplot error is 
computed as the Block x C interaction pooled across the levels of A, denoted as Block x C(A). 
The equations from which to obtain the method of moments estimates of the variance 
components are 


MSBlock = a; + co], + acdl 
MSError(ivhole plot) = cr + ccr 


and 


MSError(snbplot) = a\ 

The method of moments solution to these equations is 
al = MSError(snbplot) 

_ 2 MSError(zvholeplot) - MSError(subplot) 


and 


MSBlock - MSError(whole plot) 


TABLE 24.9 


Analysis of Variance Table for the General 
Split-Plot Model in Section 24.1 


Source 

df 

Expected Mean Square 

Block 

r-1 

a 2 £ + cc\, + accr b 

A 

a - 1 

cr; + cal, + 0 2 («) 

Error(whole plot) 

(r-l)(«-l) 

al + ca 2 w 

C 

c -1 

ff t .+ 0 2 (y) 

AxC 

(a - l)(c - 1) 

al+<l> 2 (ay) 

Error(subplot) 

a(r-l)(c-l) 
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The method of moments estimates of the variance components are 


~2 = l d l ifd l >0 

° W jo if ^<0 

and 

= if dg > 0 

B [0 if erg < 0 

The estimators of p ijr /Z,., and q. ( are yif.., and y respectively. The comparisons 
among the levels of A are between whole plot comparisons and the proper F-statistic for 
testing the equality of levels of A means is F A = MSA/MSError(whole plot). The compari¬ 
sons among the levels of C and for the A x C interaction are within whole plot compari¬ 
sons or between subplot comparisons within a whole plot and the proper F-statistics are 
F c = MSC/MSError(subplot) and F AxC = MSA x C/MSError(subplot). These F-statistics were 
constructed by looking at the expected mean squares in Table 24.9. 

Once the F-tests have been computed to determine if there are significant differences 
between means, the next step is to carry out multiple comparisons to determine where the 
differences occur. The following section presents methods to compute standard errors of 
various differences of means for split-plot designs. 


24.3 Standard Errors for Comparisons among Means 

Contrasts of treatment means or treatment combination means are used to study the treat¬ 
ment effects, particularly when the analysis of variance indicates that one or more of the 
fixed effects are significantly different from zero. The standard error of a contrast of sample 
means is necessary to determine if a contrast in the means is equal to zero or to construct 
a confidence interval about the contrast in the means. More often than not, contrasts involve 
a comparison of two means. Consequently, comparisons of two means are discussed in 
this section and general contrasts are discussed in Section 24.4. 

To demonstrate methods for determining appropriate standard errors, the general split-plot 
design model is used where the whole plot design is a one-way treatment structure in a ran¬ 
domized complete block design and the subplot treatment structure involves a one-way 
treatment structure. The corresponding effects model for this situation can be expressed as 

yi j k = p + a i + b k + w lk +Y j + (ay) lj + £ ijk , i = l,2,...,a, ; = l,2,...,c, k = l,2,...,r 
A means model can be expressed as 

Vijk = f^ij + K+ w ik + £ ijkf i = l,2,...,«, j - 1,2,... ,c, k = l,2,... ,r 


where the terms are as described in Section 24.1. 
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Four types of comparisons maybe of interest, depending on whether any interaction exists 
between the levels of A and the levels of C. If there is no interaction, then it is of interest to 
compare the levels of A to one another and to compare the levels of C to one another. 

To compare the levels of C, one needs to compare two of the jH which are estimated 
by the y.j.. The process of determining the appropriate standard error involves expressing 
y.j. in terms of the quantities in the model obtained by summing over i and k. The model 
for the;'th main effect mean of C is y.j. = ju ■ + b. + w.. + e. h . Consider the difference p A - p. 2 . 
The estimate of p A - IJ. 2 is y. x . - y . 2 ., which can be expressed in terms of the C mean model 
as y.i. - y. 2 . = h.i - p. 2 + £. i. - e. 2 . as the terms involving b. and id., cancel out; that is, the 
comparison y A . - y, 2 . does not depend on the whole plot error, nor does it depend on the 
block error. The variance of y A . - y. 2 . can be shown to be equal to 


Var(y 4 . - y. 2 .) = Var(£- £. 2 .) = — E 


where the variance of the mean £ , is Var(£.j.) = a;/ar and where ar is the number of obser¬ 
vations in the mean. Similarly, one obtains Va rfy- V = la'i/ar for any j^j'. The 
estimate of the standard error of y.j. - y is 


s-e-fj.j. - y. r . 



2MSError(snbplot) 


ar 


for all j ^ ]' 


which is based on a(c - l)(r - 1) degrees of freedom. If one wants to carry out multiple 
comparisons (see Chapter 3), a(c - 1 )(r - 1) is the number of degrees of freedom needed to 
be used when determining the percentage point of the desired multiple comparison 
procedure. For simplicity, the LSD values are computed, but the LSD values may not be the 
appropriate method for a given situation. The LSD value for comparing two subplot treat¬ 
ment means is 


LSD a — [f a /2,n(c-l)(r-l)] s - e -(y.j. y.j'.) 

To compare the levels of A, one needs to compare the /u L , which are estimated by the y ,-... 
The quantity y u . can be expressed in terms of the general model by summing over j and k, 
obtaining = jJ u + b. + w u + e,j... 

The estimate of the contrast - jl 2 . is y ,.. - y 2 .., which can be expressed in terms of the 
y u . model as y ,.. - y 2 .. = p A . - p 2 . + w u - w 2 . + - I 2 ... This comparison depends on both 

the whole plot and subplot variance components. The variance of y x „ - y 2 .. is 


Var(yj„ 


y 2 ..) = Va r(w A . - zv 2 . + e lm . 




2(g 2 £ + col) 


^ 2 ** ) 


rc 
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The estimate of the standard error of y lm . - y 2 .. is 


s-e-(y i.. - y 2 J = 


rc 


2 ( 07 +c<r;,) _ 2 MSError(wholeplot) 


rc 


which is based on (r - 1 )(a - 1) degrees of freedom. The LSD value for comparing two whole 
plot treatment means is LSD„ = ~ y 2 ..)- 

When there is a significant A x C interaction, comparisons usually must be based on the 
set of two-way cell means. There are two different types of comparisons one must consider 
when studying these cell means. The first type arises when two subplot treatment (C) 
means are compared at the same level of a whole plot treatment (A), such as ju u - ,u ]2 . The 
best estimator of p n - ,u ]2 is y u . - y 12 .. The term y i; . can be expressed in terms of the general 
model by summing over k as y X] . = p, t + b. + w u + £, t , and the estimate y u . - y 12 . can be 
expressed as y n . - y 12 . = p n - p 12 + e u . - e 12 .. The variance of y n . - y l2 . is Var(y n . - y 12 .) = 
Var^Tj. + £ u . - w h £ 12 .) = 2a 2 Jr. Thus the variance of comparisons between subplot treat¬ 
ments at the same level of the whole plot treatment depends only on the subplot error. The 
estimate of the standard error of y u . - y 12 . is 


s-e-(y„. - y 12 .) 


2 (7 1 \2MSError(subplot) 

r V r 


and the corresponding LSD value is LSD U = [t a /2,a(r-i)(c-v\ s - e -(y\. _ Vi-i)- This LSD value can 
be used to compare any pair of subplot treatments at the same level of a whole plot 
treatment. 

The second type of comparison occurs when two whole plot treatments are compared at 
the same level or different levels of the subplot treatments, such as p n - p 2] or p n - p 22 . 
These two types of comparisons have the same standard errors. The best estimate of 
Mn - M21 is yu. ~ y 2 i./ which can be expressed in terms of the general model as y n . - y 21 . = 
M11 - M21 + Mi). - w 2 . + e u . - e 2 \.. Then 


Var (yn. - y 2 i.) = Va r(w lm - w 2 . + e n . - e 21 .) 


2 ^ + 
r r 
+ op 
r 


This comparison depends on both the whole plot and the subplot variance components. 


An unbiased estimate of + crj, is 


_L rr — 


MSError(zvholeplot) + (c - l)MSError(subplot) 



Methods for Analyzing Split-Plot Type Designs 


433 


The sampling distribution associated with o £ + a% is not a chi-square distribution, but is 
a linear combination of chi-square distributions. The degrees of freedom associated with 
0 V+ CF 7 can be estimated using the Satterthwaite approximation as 


v = 


K 2 + ) 2 


[MSE(wholeplot) / c]‘ 
(r - 1 )(a - 1 ) 


fhl MSE(snbplot) 


a(r - l)(c - 1) 


An approximate LSD value for comparing two whole plot treatments at the same or 
different subplot treatment is 


LSD a - (t a j 2 -)^ 


+ <?l) 


For the data in Example 24.1 (Table 24.1), the temperature means, recipe means and tem¬ 
perature by recipe means are given in Table 24.10. The estimated standard errors for the 
four types of comparisons are computed as follows: 

1) For comparing recipe main effect means: the estimated standard error is 


s-e.(y.i.~ V.2.) 


2(657.87) = 12Q9 
3(3) 


and this estimated standard error is based on 18 degrees of freedom. Note that f 0 02518 = 2.101, 
so the 5% LSD value for comparing recipe main effect means is LSD 005 = 2.101(12.09) = 
25.40. 

2) For comparing temperature main effect means: The estimated standard error is 


s-e-iVi- 


y 2 ..) 


26.13 

V 3(4) 


TABLE 24.10 


Means for Recipes, Temperatures and for Combinations of Recipe and 
Temperature for the Data in Table 24.1 


Temperature 

Recipe A 

Recipe B 

Recipe C 

Recipe D 

Temperature Mean 

325 

1167.09 

1157.00 

1160.02 

1166.68 

1162.70 

340 

1388.47 

1362.76 

1294.17 

1379.76 

1356.29 

355 

1203.39 

1211.07 

1263.17 

1272.06 

1237.42 

Recipe mean 

1252.98 

1243.61 

1239.12 

1272.83 
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and this standard error is based on 4 degrees of freedom. Then t 0025A = 2.776 and the 5% 
LSD value for comparing temperature main effect means is LSD 005 = 2.776(26.13) = 72.55. 

3) For comparing recipe means at same temperature level: The estimated standard error is 


s.e.(y n . - y 12 .) = JMZET = 20 .94 

and this estimated standard error is based on 18 degrees of freedom. Hence, the 5% LSD value 
for comparing two recipe means at the same temperature level is LSD,, 05 = 2.101(20.94) = 43.99. 
4) For comparing temperature means at the same or different recipe levels: The estimated stan¬ 
dard error is 


s-e-(yn. - y 2 1 .) 


2 [{4096.42+ 4 


(4-1) 


657.87] 


\2 [1517.51] 

V 3 " 


and it is based on 


(1517.51) 2 

({4096.42) 2 (|657.87) 2 

4 + 18 


8.35 degrees of freedom 


Note that t 0 025 , 8 . 35 = 2.289, and thus, the 5% LSD for comparing two means that are at 
different temperature levels is LSD 005 = 2.289(31.80) = 72.79. 

Next a general process for computing standard errors split-plot type designs is described. 


24.4 A General Method for Computing Standard Errors of 
Differences of Means 

Applying the techniques presented above for computing standard errors for comparisons 
involving more than one size of experimental unit is not always straightforward. In this 
section, a general method for computing standard errors and approximate degrees of 
freedom values is described that can be applied to more complex situations. This section 
may be skipped by the casual reader. The method is described by using an example with 
the model 


y,jk = li+a, + b k + w, k + y } + {ay) l} + e, jk , i = l,2,...,a, ; = 1,2,...,c, k = l,2,...,r 

Basically, the technique consists of expressing a given comparison of means as the sum 
of components where each component is a comparison of means involving only one size of 
experimental unit. Then the components of the comparison are independently distributed. 
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and the variance of the comparison is obtained by summing the variances of the 
components. For example, the comparison p u - p 2 \ can be expressed as 

Mu - M21 = (Mi. - Mz.) + [(Mu ~ Pi.) ~ (M21 - Pi. )] 

The component (m,. - p 2 .) is a whole plot comparison, and the component [(jn u - p v ) - 
(M 21 ~~ M 2 .)[ is a subplot comparison. The estimates of these components are p t . - p 2 . = y,.. - y 2 .. 
and (fi n - y,.) - (y 21 - fi 2 ) = (y n . - y,..) - (y 2 i- - Vi-X respectively. The estimate of p n - y 21 is 


Mu - M21 = Mi. - M,. + (Mu “ Mi.) - (M21 - M2.) 

= 3/n. — 3 / 21 - 


Since comparisons computed from the whole plot part of the model are independently 
distributed of comparisons computed from the subplot part, the variance of p n - p 21 is 

Var(Pn - M 21 ) = Var(yj.. - y 2 „) + Var[(y„. - y u .) - (y 21 . -y,..)] 


The quantity y v . - y 2 .. expressed in terms of the general model is y,.. - y 2 „ = y,. - y 2 . + 
w v - w 2 . + £].. - e 2 .. and its variance is 

Var(yj.. - y 2 ..) = Var^Tj. - w 2 . + £ u . - £ 2 „) 


An estimate of Var(y 1 .. - y 2 ..) is (2 /re) MSError(WholePlot). 

The quantity (y n . - y,..) - (y 21 . - y 2 ..) expressed in terms of the general model is 


(3/11. - 3 /i--) - (3/21. - 3/2..) = (Mu + b. + Wj. + £„.) - (Mi. + k. + Wi. + £1..) 

— (jU 21 + + W 2 . ^21* ) (A^2* ^2» ^2»» ) 

— [(/^ll — Ml. ) — (M21 _ M2* )] [(^11. — ^1.. ) _ (^21* ~ ^2** )] 

and its variance is 

Var[(y n . - Mi..) - (y 2 i. - y 2 .. )1 = Var[(£ n . - £,..) - (e 21 . - £ 2 „)] 

2(c-t) 

= -—- 


An estimate of Var[(y n . - y x .) - y 21 . - y 2 ..)] is 


Var[(y u . - y x J - (y 21 . 


y 2 Jl 


2 (c - 1) 


MSError (subplot) 


rc 
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Combining the whole plot component variance and the subplot component variance 
yields 

Var[yn. - y 21 .] = (—W + ca 2 J + — — -o 2 e 
\rc) re 

and the estimate of the standard error of y u . - y 21 . is 


2\ 2 (c- 1) 

s.e.[y n - y 21 ] = II — MSE(wholeplot) h-- MSE(subplot) 

v \rc) rc 


The approximate number of degrees of freedom associated with the s£.[y n . - y 21 .] can be 
obtained using the Satterthwaite approximation as: 


v = 


— | [MSE(wholeplot)] + ——— [MSE(subplot)]\ 
rc) rc ) 


—J [MSE(wholeplot)] | 


——— [MSE(subplotm 
rc J 


(a - l)(r - 1) a(r - l)(c - 1) 

With some simple algebra one can show the above sx.[y n . - y 12 .] is identical to that obtained 
in Section 24.2. 


24.5 Comparison via General Contrasts 

General contrasts of means can be constructed for each of the comparisons discussed in 
Section 24.2. A contrast between the levels of C or the y., is 

c 

9 = djfZ j + d 2 j 7 2 + ■ • • + d ci u. c where ^ d. = 0 

M 1 

An estimate of the contrast is 6 = dfj A . + d 2 y. 2 . +- 1 d c y. c . with variance 


Var(0) 


ar 


It 

7=1 


The estimate of the standard error of 6 is 


s.e.(0) 


l MSError (subplot) 


ar 


Id 
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A contrast between the levels of A or the p r is r = /ty/,. + h 2 p 2 . + —h h a p a . where Z“_, h, = 0; 
its estimate is t = hfj x .. + h 2 y 2 .. + —t b„y fl .. with variance 


Var(f) = 


/ a: + ca 2 ^ 


5X 


The estimate of the standard error of f is 


se (x)_ f MSErrorlwholeploty jy' ^2 

'll rc Jtf 


A contrast of subplot treatments at the same level of a whole plot treatment; that is, a 
contrast of the p n , p, 2 ,.. .,p k is <5, = s x p n + + —t s c p k where H H Sj = 0. Its estimate is S, = 

SjT/n. + s 2 y i2 . + —t s c y jc . with variance 


Var(<5) = ^2> 


The estimate of the standard error of <S, is 


— - MSErrortsubplot) , 

s.c.(b,)= -—^ 2> ; 


/=i 


A contrast of whole plot treatments at the same subplot treatment, that is, a contrast of 
Mi,/ M 2 :// • • • / Ma;>' s A ; = «iMi, + M 2 M 2 ; + —» fl Mo, where Z",i u i = 0. It is estimated by A ; - = i^y,,. + 
Wji/ 2 ,. + — 1 - w„y (i/ ., and the variance of A ; - is 


Var(i)= ((Tg +<Tj X«, 2 


The estimated standard error of A, is 


r f=l 


where 


/T _L — 


MSError(zvholeplot) + (c - l)MSError(snbplot) 
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The number of degrees of freedom associated with o] + <7 2 is obtained using the 
Satterthwaite approximation (see Section 24.2) as 


v = 


/ ~ 2 . ~2 \2 

K a e + ) 


\_MSE(ivholeplot) / c] 2 [fe MSE(subplot)\ 


(r - 1 )(a - 1) 


a(r - l)(c - 1) 


Next consider any linear combination of the pfs such as Xti X) } g.fi,,- From Chapter 8, we 
know that such a contrast is an interaction contrast if X“, g tj = 0 for every j and if Xy=i g tj = 
0 for every i. One can show that 


Var 


a C 

XX*y* 

V*=i H J 


\ ( 

= Var 


0 =1 ;=i 


= Var 


= Var 


XI (fe*o 


i=l y=l 


+ Var 


fl c 


il(fe&. + fe« ; f. + fe e *.) 


II (fe»0 


i=l y=l 


= (fe)^ + 
r 


K§,) 2 


II fey)' 

;=i j=i 


=1 y'=l 

+ Var 

r 


II feyfe) 


i=l y=l 


If Xfe X; i gjjPij is an interaction contrast, then 


Var 


II fey,/- 


IK?,/ 


i=l y=l 


and this variance can be estimated by 


Var 


XX fey,,- 

0=1 h ; 


IKfe ) 2 


i=l y'=l 


and its corresponding degrees of freedom are a(r - l)(c - 1). 

Next consider the A u main effect mean defined by y u . One can obtain this main effect 
mean as a special case of X"=i X, i yfe,,. by taking 

g„ = 1 if i = i’ for/' = 1 , 2 ,..., c 

’ c 

= 0 otherwise 


Then (yj 2 = 1, and X(y,.) 2 = (g r .) 2 = 1, and 

i =1 


life ;) 2 = I (fey ) 2 = 



1 

c 
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Thus, the 


Var(y,J 



rc 


-(a 2 E + cal 


+ cal) 


Note that the variance of the estimate of an A main effect mean depends on the block vari¬ 
ance component as well as the whole plot and subplot variance components. Similarly, one 
can show that the variance of the estimate of C main effect mean, y h is 


Var(y. ; .) 



—{a] + aal + aa 2 B ) 
ra 


Finally, the estimate of Hi) is J/*/., and its variance is 


Var (y„.) 



1 

r 


(al + °w 


+ <*l) 


Note that the degrees of freedom associated with each of the estimates of these functions 
of the three variance components will need to be estimated by the Satterthwaite 
method. 

Many researchers use effects models rather than means models to describe their data. It 
is important to be able to express contrasts of means as their corresponding contrasts in 
effects model parameters. For example, the expression for <5, is 


<5, = 

7=1 

c 

= + + // + (®x)//] 

7=1 

c c c c 

= X s /b+ Y, s i a i +S s /h, + y s 

7=1 7=1 7=1 7=1 

c c c c 

= + «iS s 7 + X s 7 r,- + X s /( a y)«7 

7=1 7=1 7=1 7=1 

c c c 

= X SjYj + X S, (ay),,, since X s ; = 0 

7=1 7=1 7=1 

Thus, this contrast of the means involves both the y and the (ay),Examples of such 
contrasts using SAS-Mixed will be illustrated later in this chapter. 

The estimates and estimated standard errors given above can be used to test hypotheses 
via t -tests and/or construct confidence intervals about estimable functions of the fixed 
effect parameters. There are many possible choices for contrasts of interest. For example, if 
the levels of a factor are quantitative, one can use the linear, quadratic, and so on, contrasts 
to investigate possible trends in the mean parameters. Other contrasts could involve 
comparing a set of controls to each of the treatments. 
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24.6 Additional Examples 

24.6.1 Example 24.3: Moisture and Fertilizer 

The data in Table 24.11 are from an experiment where the amount of dry matter was mea¬ 
sured on wheat plants grown in different levels of moisture and with different amounts of 
fertilizer. There were 48 different peat pots and 12 plastic trays; four pots could be put into 
each tray. The moisture treatments consisted of adding 10,20,30, or 40 ml of water per pot 
per day to the tray where the water was absorbed by the peat pots. The levels of moisture 
were randomly assigned to the trays. The trays are the large size of experimental unit or 
whole plot, and the whole plot design is a one-way treatment structure (the four levels of 
moisture) in a completely randomized design structure. The levels of fertilizer were 2,4,6, 
or 8 mg per pot. The four levels of fertilizer were randomly assigned to the four pots in 
each tray so that each fertilizer occurred once in each tray. The pot is the smallest size of 
experimental unit or split-plot or subplot, and the subplot design is a one-way treatment 
structure (the four levels of fertilizer) in a randomized complete block design structure 
where the 12 trays are the blocks. The wheat seeds were planted in each pot and after 
30 days the dry matter of the wheat plants growing in each pot was measured. A model 
that can be used to describe the dry matter from a pot in the kth tray assigned to the /th 
level of moisture and /th level of fertilizer is 

Vijk = + hk + Pip 1 = 1/ 2 , 3,4; j = 1, 2, 3,4; k = 1, 2, 3 

where is the mean dry matter of level i of moisture with level j of fertilizer, t ik is the tray 
error term distributed i.i.d. N( 0, of ), and p ijk is the pot error term distributed i.i.d. N( 0, 
a£ ot ). Note that p ijk is equivalent to the residual error in this model. The analysis of variance 
table is in Table 24.12, which shows a significant Moisture x Fertilizer interaction. Because 
there is an interaction between moisture and fertilizer, the treatment combination means, 
which are given in Table 24.13, are used when making inferences. Since the levels of 
moisture and the levels of fertilizer are equally spaced quantitative levels, orthogonal 


TABLE 24.11 


Dry Matter Measurements per Pot for Example 24.3 


MST 

Tray 

Fertilizer 2 

Fertilizer 4 

Fertilizer 6 

Fertilizer 8 

10 

1 

3.3458 

4.3170 

4.5572 

5.8794 

10 

2 

4.0444 

4.1413 

6.5173 

7.3776 

10 

3 

1.9758 

3.8397 

4.4730 

5.1180 

20 

1 

5.0490 

7.9419 

10.7697 

13.5168 

20 

2 

5.9131 

8.5129 

10.3934 

13.9157 

20 

3 

6.9511 

7.0265 

10.9334 

15.2750 

30 

1 

6.5693 

10.7348 

12.2626 

15.7133 

30 

2 

8.2974 

8.9081 

13.4373 

14.9575 

30 

3 

5.2785 

8.6654 

11.1372 

15.6332 

40 

1 

6.8393 

9.0842 

10.3654 

12.5144 

40 

2 

6.4997 

6.0702 

10.7486 

12.5034 

40 

3 

4.0482 

3.8376 

9.4367 

10.2811 
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TABLE 24.12 


Analysis of Variance Table for Dry Matter for the Moisture and Fertilizer Example 


proc mixed cl covtest method=type3 data=ex 243; 
class mst tray fr; 
model dry matter=mst|fr / ddfm=KR; 
random tray(mst); 

Source 

df 

Sum of Squares 

Mean Square F- Value 

Pr>F 

Expected Mean Square 

MST 

3 

269.189429 

89.729810 26.34 

0.0002 

Var (Residual) + 4 Var[TRAY(A4ST)] 






+ Q(MST,MST x FR) 

FR 

3 

297.054856 

99.018285 131.65 

<0.0001 

Var(Residual) + Q(FR,MST x FR) 

MST x FR 

9 

38.056379 

4.228487 5.62 

0.0003 

Var (Residual) + Q(MST x FR) 

TRAY(MST) 

8 

27.251576 

3.406447 4.53 

0.0019 

Var (Residual) + 4 Var[TRAY(A4ST)] 

Residual 

24 

18.051379 

0.752141 


Var (Residual) 


TABLE 24.13 


Fertilizer by Moisture Cell Means for Dry Matter 


lsmeans mst 

fr/diffs; 






Moisture 


Fertilizer 

10 

20 

30 

40 

2 

3.1220 

5.9711 

6.7151 

5.7957 

4 

4.0993 

7.8271 

9.4361 

6.3307 

6 

5.1825 

10.6988 

12.2790 

10.1836 

8 

6.1250 

14.2358 

15.4347 

11.76630 


polynomials can easily be used to investigate the trends over the levels of fertilizer for each 
level of moisture and the trends over the levels of moisture for each level of fertilizer. The 
contrasts that measure the linear and quadratic trends of fertilizer at the 7th level of mois¬ 
ture (a contrast of the subplot treatments at the same whole plot treatment) are 

&LinF\M, ~ ~ Pi2 + Pi3 + 3/1,4 ’ * — t 2/ 3, 4 

and 

^QuadF\M, ~ Pil ~ Pi2 ~ Pi3 + Pi4' * — 3, 4 

Their estimates are 

^LinFfM, = — 3l Ij.i — yj .2 + t/,. 3 + 3t/,. 4 , i = 1, 2, 3, 4, 


<W|M, = y,.i - y,2 - y,3 + y M , 1 = l 3, 4, respectively 


and 
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The variances of these contrasts are 

0.2 

Var(5 UnF|M .) = -^[(-3) 2 + (-1) 2 + l 2 + 3 2 ] 

20 °; 

3 


and 


Var(d Wi ) = [(-l) 2 + (-1) 2 + l 2 + l 2 


4 a; 

—respectively 


The estimated standard errors are obtained by replacing aj with MSError(pot) in the square 
root of the respective variances. The SAS-Mixed code to evaluate these contrasts is given 
in Table 24.14. The estimates of the linear and quadratic trends of fertilizer for each level of 
moisture and the corresponding estimated standard errors and f-statistics (testing that the 
trends are zero) are given in Table 24.15. 

The contrasts that measure the linear and quadratic trends of moisture at each fertilizer 
level (comparisons of whole plot treatments at the same subplot treatment) are 

^LinM\Fj ~ — ; — Ah/ Ah/ ^Ah/' j ~ 1 2, 3, 4 


TABLE 24.14 

SAS-Mixed Code with Estimate Statements Used to Evaluate the Linear and 
Quadratic Trends 

proc mixed cl covtest method=reml data=ex_243; 
class mst tray fr; 

model dry_matter=mst|fr / ddfm=KR; 
random tray / sub=mst; 
lsmeans mst|fr/diffs; 


estimate 

'LF 

M10 ‘ 

’ fr 

-3 -1 

1 

3 

mst*fr 

-3 

- 

1 

1 

3 ; 









estimate 

'LF 

M2 O' 

’ fr 

-3 -1 

1 

3 

mst*fr 

0 

0 

0 

0 

—3 -1 

1 

3; 







estimate 

'LF 

M3 O' 

’ fr 

-3 -1 

1 

3 

mst*fr 

0 

0 

0 

0 

0 0 0 

0 

-3 ■ 

-1 

1 

3; 




estimate 

'LF 

M4 0 ' 

’ fr 

-3 -1 

1 

3 

mst*fr 

0 

0 

0 

0 

0 0 0 

0 

0 0 

0 

0 

_3 

1 -1 

1 

3 ; 

estimate 

'QF 

M10' 

’ fr 

1 -1 - 

-1 

1 

mst*fr 

1 

-1 

- 

1 

1; 









estimate 

'QF 

M2 O' 

’ fr 

1 -1 - 

-1 

1 

mst*fr 

0 

0 

0 

0 

1 -1 • 

-1 

1; 







estimate 

'QF 

M3 O' 

’ fr 

1 -1 - 

-1 

1 

mst*fr 

0 

0 

0 

0 

0 0 0 

0 

1 - 

1 - 

-1 

1; 




estimate 

'QF 

M4 0 ' 

’ fr 

1 -1 - 

-1 

1 

mst*fr 

0 

0 

0 

0 

0 0 0 

0 

0 0 

0 

0 

1 

-1 - 

-1 

1; 

estimate 

'LM 

F2 ' 

mst 

-3 -1 

1 

3 

mst*fr 

-3 

0 

0 

0 

-1 0 

0 

0 1 

0 

0 

0 

3 ; 



estimate 

'LM 

F4 ' 

mst 

-3 -1 

1 

3 

mst*fr 

0 

-3 

0 

0 

0 -1 

0 

0 0 

1 

0 

0 

0 3 ; 



estimate 

'LM 

F6' 

mst 

-3 -1 

1 

3 

mst*fr 

0 

0 

-3 

0 

0 0 ■ 

-1 

0 0 

0 

1 

0 

0 0 

3 ; 


estimate 

'LM 

F8 ' 

mst 

-3 -1 

1 

3 

mst*fr 

0 

0 

0 

-3 

0 0 

0 ■ 

-1 0 

0 

0 

1 

0 0 

0 

3; 

estimate 

'QM 

¥2' 

mst 

1 -1 - 

-1 

1 

mst*fr 

1 

0 

0 

0 

-1 0 

0 

0 -1 

0 

0 

0 

1; 



estimate 

'QM 

F4 ' 

mst 

1 -1 - 

-1 

1 

mst*fr 

0 

1 

0 

0 

0 -1 

0 

0 0 ■ 

-1 

0 

0 

0 lj 



estimate 

'QM 

¥6' 

mst 

1 -1 - 

-1 

1 

mst*fr 

0 

0 

1 

0 

0 0- 

1 

0 0 

0 - 

-1 

0 

0 0 

1 j 


estimate 

'QM 

F8 ' 

mst 

1 -1 - 

-1 

1 

mst*fr 

0 

0 

0 

1 

0 0 0 

- 

1 0 

0 1 

0 - 

-1 

0 0 

0 

1; 
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TABLE 24.15 


Estimates of the Linear and Quadratic Trends of Fertilizer for Each 
Level of Moisture 


Moisture 

Linear 

Quadratic 

Estimate 

f-Value 

Estimate 

f-Value 

10 

10.0922 

4.51 

-0.03483 

-0.03 

20 

27.6660 

12.36 

1.6810 

1.68 

30 

29.0017 

12.95 

0.4346 

0.43 

40 

21.7646 

9.72 

1.0478 

1.05 


Note: The estimated standard errors for the linear and quadratic trends are 
2.239 and 1.001, respectively. Compare t-values with t a/2>24 . 


and 

^QuadMlFj = p v - p 2j - p 3j + p ijr j = 1,2,3,4, respectively 

The estimates and corresponding variances of these comparisons of the levels of fertilizer 
within each level of moisture are 

^LinMlFj = ~^Vlj-~ Pt]- + 3/3 ^Vij-' j = ^/ 2, 3, 4 

^QuadM\F k = Vlj- — Vij- ~ Vij- + V$,)•’ j = 1 ' 2/ 3, 4 

and 


and 


Var(A LjnM | Ji ) 


^pot + <hra y 2 J2 j2 3 2 } 

3 v ’ 

20 (< t + g; ay ) 

3 


Var(A !3wiljM p i ) 


2 2 

<V + <hray 2 +1 2 + 1 2 + l2) 

3 

4(<Tpo t + <T t 2 ray ) 

3 


The estimated standard errors are obtained by replacing (7?, ol + a\ with 


^. 2 , z -2 MSError(tray) + (4 - l)MSError(pot) 

^pot "t" ^tray ~ 


4 



444 


Analysis of Messy Data Volume 1: Designed Experiments 


in the square root of the respective variances. The Satterthwaite approximation can be 
used to obtain the approximate degrees of freedom to be associated with <t£ o1 + bf ray . For 
this example 


pot 


+ <7, 


tray 


3.406 + (4 - 1)0.752 
4 


1.416 


which is based on v degrees of freedom where 

[3.406+ (4-1)0.752] 2 
[3.406] 2 [(4-1)0.752] 2 

8 + 24 

The estimates of the linear and quadratic trends of fertilizer for each level of moisture and 
the corresponding estimated standard errors and f-statistics are given in Table 24.15. Table 
24.15 shows that there is a linear response to fertilizer for each level of moisture and no 
significant quadratic trend. Table 24.16 shows that there are both linear and quadratic 
responses to moisture for each level of fertilizer. The graphs in Figures 24.5 and 24.6 show 
the response to fertilizer for each moisture level and the response to moisture for each 
fertilizer level. A Satterthwaite approximation was used to determine the approximate 
degrees of freedom given in Table 24.16. 


24.6.2 Example 24.4: Regression with Split-Plot Errors 

The levels of moisture and fertilizer in Example 24.3 are quantitative levels, so it may be of 
interest to investigate whether a regression model can be constructed that will describe the 
data. This is a different model from the usual regression situation since the data are corre¬ 
lated. Mixed models software is required to carry out the computations. To obtain an 
appropriate analysis, one needs to use moisture as a continuous variable in the regression 
model and as a class variable in the random statement. Mst is used to denote moisture as 
a continuous variable and mstblk (which is equal to mst) is used to denote the class variable. 
Table 24.17 contains the SAS-Mixed code where the model statement is a general cubic 
regression model for moisture (mst) and fertilizer ( fr ). The random statement inserts the 


TABLE 24.16 


Estimates of the Linear and Quadratic Trends of Moisture for 
Each Level of Fertilizer 


Fertilizer 

Linear 

Quadratic 

Estimate 

t -Value 

Estimate 

f-Value 

2 

-2.74 

2.85 

-3.7684 

-2.74 

4 

-4.97 

2.70 

-6.8332 

-4.97 

6 

-5.54 

5.40 

-7.6118 

-5.54 

8 

-8.57 

5.90 

-11.7792 

-8.57 


Note: The estimated standard errors for the linear and quadratic trends 
are 3.072 and 1.374, respectively. Compare f-values with t a/2 W3 . 
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FIGURE 24.5 Graphic of dry matter means against the level of fertilizer for each level of moisture. 


tray error term and imposes the correlation structure on the data. The estimates of variance 
components are given in Table 24.17 and the solutions for the regression coefficients are 
given in Table 24.18. Most of the significance levels are quite large, so several steps of a 
deletion process (there are no automatic processes in SAS-Mixed) were carried out until all 
of the remaining variables had coefficients that were significantly different from zero. The 
reduced model is 


DM,,, = A, + I3 ] mst i + PJp + ^(rnst^frj) + /3 4 (msf,) 2 (/r ( ) + /^(msf.X/f,) 2 


+ t ,+ 


Vijk 


It is of interest to determine if the reduced model adequately describes the data, so a lack 
of fit test is constructed. Let frx =fr and include frx in the class statement. Include mstblk xfrx 
in the reduced model, as shown in Table 24.19. The F-test corresponding to mstblk xfrx in the 
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FIGURE 24.6 Graphic of dry matter means against the level of moisture for each level of fertilizer. 

TABLE 24.17 

SAS-Mixed Code and Covariance Parameter Estimates for Full Regression Model 

p r 0 C mixed data=ex_243 cl covtest; 
class mstblk tray; 

model dry_matter=mst mst*mst mst*mst*mst fr fr*fr fr*fr*fr mst*fr fr*mst*mst 
fr*mst*mst*mst mst*fr*fr mst*mst*fr*fr mst*mst*mst*fr*fr 
mst*fr*fr*fr mst*mst*fr*fr*fr mst*mst*mst*fr*fr*fr 
/ solution ddfm = KR; 
random tray(mstblk); 

Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Standard Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Tray(MSTBLK) 

0.6636 

0.4293 

1.55 

0.0611 

0.05 

0.2544 

4.2338 

Residual 

0.7521 

0.2171 

3.46 

0.0003 

0.05 

0.4586 

1.4556 
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TABLE 24.18 

Solution for the Parameters for the Full Response Surface Model 
Solution for Fixed Effects 


Effect 

Estimate 

Standard Error 

df 

t- Value 

Pr > |f| 

Intercept 

- 21.3654 

34.7678 

24.8 

-0.61 

0.5445 

MST 

3.9747 

5.3173 

24.8 

0.75 

0.4618 

MST x MST 

-0.1861 

0.2349 

24.8 

-0.79 

0.4358 

MST x MST x MST 

0.002723 

0.003120 

24.8 

0.87 

0.3911 

FR 

14.7827 

26.4179 

24 

0.56 

0.5810 

FR x FR 

- 2.9296 

5.8356 

24 

-0.50 

0.6202 

FR x FR x FR 

0.1556 

0.3875 

24 

0.40 

0.6916 

MST x FR 

- 2.5838 

4.0403 

24 

-0.64 

0.5286 

MST x MST x FR 

0.1329 

0.1785 

24 

0.74 

0.4637 

MST x MST x MST x FR 

- 0.00205 

0.002370 

24 

-0.87 

0.3948 

MST x FR x FR 

0.5263 

0.8925 

24 

0.59 

0.5609 

MST x MST x FR x FR 

- 0.02671 

0.03943 

24 

-0.68 

0.5046 

MST x MST x MST x FR x FR 

0.000413 

0.000524 

24 

0.79 

0.4380 

MST x FR x FR x FR 

- 0.02884 

0.05926 

24 

-0.49 

0.6310 

MST x MST x FR x FR x FR 

0.001518 

0.002618 

24 

0.58 

0.5674 

MST x MST x MST x FR x FR x FR 

- 0.00002 

0.000035 

24 

-0.70 

0.4934 


TABLE 24.19 

SAS-Mixed Code to Test for the Lack of Fit of the Reduced Regression Model 

proc mixed data=ex_243 cl covtest; 
class mstblk FRX tray; 

model dry_matter=mst fr mst*fr fr*mst*mst mst*fr*fr FRX*MSTBLK 
/ddfm=KR solution; 
random TRAY(mstblk); 


Covariance Parameter Estimates 






Covariance Parameter 

Estimate 

Standard Error Z-Value 

PrZ 

a 

Lower 

Upper 

Tray(MSTBLK) 

0.6636 

0.4293 1.55 

0.0611 

0.05 

0.2544 

4.2338 

Residual 

0.7521 

0.2171 3.46 

0.0003 

0.05 

0.4586 

1.4556 


analysis of variance table in Table 24.20 provides the test of lack of fit of the reduced model. 
In this case, the significance level corresponding to the test of lack of fit is 0.5561, indicating 
the reduced model adequately describes the data. The code in Table 24.21 fits the reduced 
model and displays the estimators of the covariance parameters and the estimates of the 
regression coefficients are in Table 24.22. The tray error term from the reduced model is 
somewhat less than that of the full model while the residual or pot variances are similar. 
Predicted values from the reduced model are displayed in Figure 24.7 and the moisture by 
fertilizer cell means are graphed in Figure 24.8. The regression model does an adequate job 
of describing the cell means, as would be expected from the results of the test for lack of fit. 
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TABLE 24.20 

Test for Lack of Fit in MSTBLK x FXR 
Type III Tests of Fixed Effects 


Effect 

Num df 

Den df 

F-Value 

Pr > F 

MST 

0 

— 

— 

— 

FR 

0 

— 

— 

— 

MST x FR 

0 

— 

— 

— 

MST x MST x FR 

0 

— 

— 

— 

MST x FR x FR 

0 

— 

— 

— 

MSTBLK x FRX 

10 

24.7 

0.89 

0.5561 


TABLE 24.21 

SAS-Mixed Code for Final Regression Model and Covariance Parameter Estimates 

p r 0 C mixed data=ex_243 cl covtest; 

title3 'reduced regresson model with split-plot errors'; 
title4 'Method=REML'; 
class mstblk tray; 

model dry_matter=mst fr mst*fr fr*mst*mst mst*fr*fr 
/ solution ddfm=KR; 

RANDOM TRAY (mstblk) ; 

Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Standard Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Tray(MSTBLK) 

0.5582 

0.3498 

1.60 

0.0553 

0.05 

0.2189 

3.2798 

Residual 

0.7575 

0.1881 

4.03 

<0.0001 

0.05 

0.4912 

1.3195 


24.6.3 Example 24.5: Mixed-Up Split-Plot Design 

Two researchers designed a study to evaluate how three varieties of soybeans respond to 
four different types of herbicides. At site 1, the first researcher used a split-plot design in 
two replications where the varieties of soybeans were the levels of the whole-plot and 
the herbicides were the levels of the subplots. At site 2, the second researcher used a 
split-plot design in two replications, but assigned the levels of herbicides to the whole- 
plots and the levels of the varieties to the subplots. A combined analysis was desired, but 
since the two designs are very different, a combined analysis did not seem possible. 
However, remember from Chapter 5, the split-plot design structure is nothing other than 
an incomplete block design where the whole-plots are the blocks of the incomplete block 
design. The treatment combinations in each of the 14 whole-plots or incomplete blocks 
are shown in Table 24.23, where Vx x y denotes variety x with herbicide y. The first eight 
whole-plots (from site 1) are incomplete blocks of size three and the last six whole-plots 
(from site 2) are incomplete blocks of size four. A combined analysis can be accomplished 
using the model 


lhjkl - Vkl + bi + Wjj + e ijk , 
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Predicted dry matter from response surface 
with split-plot design structure 


Dry matter (predicted) 


15.74 


11.62 


7.51 


3.39 



FIGURE 24.7 Prediction surface for dry matter as a function of moisture and fertilizer. 


Mean dry matter from two-way 
treatment structure with split-plot design structure 



FIGURE 24.8 Means for the combinations of fertilizer and moisture. 
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TABLE 24.22 

Solution for the Parameters of the Reduced Regression Model 
Solution for Fixed Effects 


Effect 

Estimate 

Standard Error 

df 

t- Value 

Pr> |f| 

Intercept 

1.9536 

0.9204 

33 

2.12 

0.0414 

MST 

0.07531 

0.04069 

40.6 

1.85 

0.0715 

FR 

-1.0795 

0.2419 

37.8 

-4.46 

<0.0001 

MST x FR 

0.1730 

0.02351 

36.5 

7.36 

<0.0001 

MST x MST x FR 

-0.00346 

0.000398 

25.4 

-8.70 

<0.0001 

MST x FR x FR 

0.001838 

0.001147 

32.4 

1.60 

0.1187 


where p kl is the mean response of the fcth variety and the 7th herbicide, b, ~ i.i.d. N( 0, °Mt) 
are the large block effects within each site, w n ~ i.i.d. N( 0, af) denote the whole-plot or 
incomplete block effects and £ i]kI ~ i.i.d. N( 0, at) are the sub-plot effects. The data are in 
Tables 24.24 and 24.25 and the SAS-Mixed code and covariance parameter estimates are 
given in Table 24.26. The model statement contains the treatment structure with varieties 
(V), herbicides ( H ) and variety by herbicide interaction. The random statement contains the 
blocks (replications within a site) and whole-plots within a block (the incomplete blocks). 
The tests of the fixed effects are given in Table 24.27 and the means are given in Table 24.28. 
There is a significant variety by herbicide interaction, thus comparisons need to be made 
using the cell means. For comparisons of the levels of herbicide within each of the variet¬ 
ies, the estimated standard error of the difference is 3.33 with 26.1 df. For comparisons of 
the varieties within each of the herbicides, the estimated standard error of the difference 
is 3.26 with df= 24.7. 


TABLE 24.23 


Combinations of Varieties and Herbicides Assigned 
to Each of the Whole Plots within Each Block 


Block 

Whole Plot 


1 

1 

V1H1, V2H1, V3H1 


2 

V1H2, V2H2, V3H2 


3 

V1H3, V2H3, V3H3 


4 

V1H4, V2H4, V3H4 

2 

5 

V1H1, V2H1, V3H1 


6 

V1H2, V2H2, V3H2 


7 

V1H3, V2H3, V3H3 


8 

V1H4, V2H4, V3H4 

3 

9 

V1H1, V1H2, V1H3, V1H4 


10 

V2H1, V2H2, V2H3, V2H4 


11 

V3H1, V3H2, V3H3, V3H4 

4 

12 

V1H1, V1H2, V1H3, V1H4 


14 

V2H1, V2H2, V2H3, V2H4 


14 

V3H1, V3H2, V3H3, V3H41 
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TABLE 24.24 

Data from First Location for Example 24.4 Where H and V Denote Herbicides 
and Varieties, Respectively, and the Entries in the Table Are Weights of Soybeans 


Block 

WP 

H 

VI 

V2 

V3 

1 

1 

1 

22.4 

39.7 

30.4 

1 

2 

2 

36.6 

38.3 

33.7 

1 

3 

3 

38.2 

33.0 

34.6 

1 

4 

4 

19.9 

28.0 

19.6 

2 

5 

1 

25.7 

32.5 

21.7 

2 

6 

2 

26.7 

25.4 

13.2 

2 

7 

3 

26.6 

28.2 

34.6 

2 

8 

4 

22.5 

18.5 

18.0 


TABLE 24.25 


Data from Second Location for Example 24.4 Where H and V Denote Herbicides 
and Varieties, Respectively, and the Entries in the Table Are Weights of Soybeans 


Block 

WP 

V 

HI 

H2 

H3 

H4 

3 

9 

1 

21.5 

24.6 

30.0 

28.8 

3 

10 

2 

29.3 

33.8 

35.1 

35.1 

3 

11 

3 

28.9 

29.0 

41.2 

38.4 

4 

12 

1 

17.3 

25.5 

23.7 

18.6 

4 

13 

2 

29.2 

29.9 

26.8 

20.4 

4 

14 

3 

14.3 

27.7 

43.3 

30.2 


TABLE 24.26 

SAS-Mixed Code and Estimates of the Covariance Parameters for Example 24.5 

p r 0 C mixed data=one covtest cl; 
class block wp V H; 
model yield=V|H/ddfm=kr; 
random block wp(block); 
lsmeans V H V*H/diff; 


Covariance Parameter Estimates 

Covariance 


Parameter 

Estimate 

Standard Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Block 

6.7190 

11.3440 

0.59 

0.2768 

0.05 

1.1382 

120756 

WP(block) 

18.2370 

12.8963 

1.41 

0.0787 

0.05 

6.5461 

150.62 

Residual 

17.4321 

5.3451 

3.26 

0.0006 

0.05 

10.3483 

35.4086 


24.6.4 Example 24.6: Split-Split-Plot Design 

A split-split-plot design has three sizes of experimental units, whole-plot, subplot, and sub¬ 
subplot, where the whole-plot, subplot, and sub-subplot are the largest, medium, and smallest 
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TABLE 24.27 

Tests of the Fixed Effects for Example 24.5 
Type II Tests of Fixed Effects 


Effect Num df Den df F -Value Pr > F 


V 

2 

30.4 

2.17 

0.1320 

H 

3 

31.3 

5.07 

0.0056 

V x H 

6 

21.3 

3.64 

0.0122 


TABLE 24.28 

Estimates of the Variety by Herbicide Cell Means, the Variety Means and the 
Herbicide Means for Example 24.5 


V 

H 

Estimate 

Standard 

Error 

V 

Estimate 

Standard 

Error 

H 

Estimate 

Standard 

Error 

1 

1 

21.31 

2.99 

1 

26.94 

2.18 

1 

24.25 

2.32 

1 

2 

29.39 

2.99 

2 

30.56 

2.18 

2 

28.32 

2.32 

1 

3 

31.09 

2.99 

3 

26.87 

2.18 

3 

32.99 

2.32 

1 

4 

25.95 

2.99 




4 

26.92 

2.32 

2 

1 

31.22 

2.99 







2 

2 

31.85 

2.99 







2 

3 

31.20 

2.99 







2 

4 

27.96 

2.99 







3 

1 

20.21 

2.99 







3 

2 

23.73 

2.99 







3 

3 

36.68 

2.99 







3 

4 

26.85 

2.99 








size experimental units, respectively. The present study involves evaluating the effects of 
combinations of two rations (regular corn and high oil corn), two temperature 
(3 and 6°C), and three types of packaging (vacuum, CO z , and low O z ) on the tenderness of 
meat. Twenty steers were randomly assigned to the two rations (10 per ration). At slaughter, 
the animal was split in halves and a loin from each side was extracted. The loin was ran¬ 
domly assigned to one of the two storage temperatures. After 10 days of storage, three steaks 
were cut from each loin and the levels of packaging were assigned to one of those steaks. The 
steak was put into a display case for seven more days. After the seven days in the display 
case, five cores were obtained from each steak and the force required to shear each core was 
measured. The data in Table 24.29 are the means of the five cores. Some packaging problems 
caused some of the steaks not to be representative of the true state of nature, and they were 
deleted from the data set. 

This study involves three sizes of experimental units. The animal is the experimental 
unit to which the levels of ration are assigned. The side of an animal is the experimental 
unit for the levels of temperature, and the steak is the experimental unit for the levels of 
packaging. A model that can be used to describe the shear forces (f ijk i) is 


fijki ~ + a ij + s ijk + e ijki z - 1/ 2, j — 1,2,..., 10, k — 1, 2, / - 1, 2, 3 
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TABLE 24.29 

Shear Force Data for Example 24.6 

Temperature 3°C Temperature 6°C 


Ration 

Animal 

Vacuum 

co 2 

Low 0 2 

Vacuum 

co 2 

Low O. 

1 

1 

9.287 

8.697 

9.708 

6.592 

— 

7.771 

1 

2 

8.192 

8.782 

9.961 

— 

8.782 

— 

1 

3 

4.487 

4.234 

— 

4.739 

2.887 

3.055 

1 

4 

— 

— 

8.613 

— 

5.329 

2.718 

1 

5 

2.382 

4.824 

4.992 

2.971 

6.339 

4.487 

1 

6 

6.508 

5.582 

8.445 

3.982 

7.182 

7.855 

1 

7 

6.171 

4.908 

5.497 

3.561 

6.171 

2.297 

1 

8 

7.266 

— 

6.171 

8.950 

5.413 

— 

1 

9 

4.487 

5.413 

6.424 

5.329 

4.571 

— 

1 

10 

— 

7.771 

6.845 

— 

4.403 

— 

2 

1 

2.718 

5.918 

6.171 

5.918 

7.266 

8.697 

2 

2 

— 

7.350 

10.80 

4.655 

— 

— 

2 

3 

5.750 

5.076 

7.687 

3.813 

4.066 

7.687 

2 

4 

5.666 

4.739 

5.918 

6.339 

— 

7.434 

2 

5 

6.761 

7.687 

— 

4.234 

5.413 

8.866 

2 

6 

6.003 

6.508 

— 

4.487 

9.455 

— 

2 

7 

5.918 

8.782 

— 

7.182 

8.950 

— 

2 

8 

4.571 

5.582 

11.06 

4.824 

9.287 

9.455 

2 

9 

— 

5.918 

8.024 

— 

7.182 

7.434 

2 

10 

2.803 

— 

— 

— 

7.434 

7.097 


where p ikl denotes the mean response of the zth ration, /cth temperature and Zth packaging, 
a { j - i.i.d. N( 0, of nimal ) denote the animal errors, s tjk ~ i.i.d. N( 0, of ide ) denote the side errors 
and £ i]kl ~ i.i.d. N( 0, of 1( , ak ) denote the steak errors. 

The animal error term can be obtained by ignoring the temperature and packaging parts 
of the treatment structure. The animal design is a one-way treatment structure in a 
completely randomized design structure where there are 10 animals assigned to each 
ration. Thus the animal error term is computed as the variation among animals treated 
alike within a ration pooled across rations, denoted by animal (ration). The side error term 
is obtained by considering only those animals assigned to ration 1 and ignoring the pack¬ 
aging effects. Each animal within ration 1 forms a block of size 2 (two sides) for the two 
temperatures, thus the side design is a one-way treatment structure (two temperatures) in 
a randomized complete block design structure (10 animals within ration 1). So the side 
error term is computed as the temperature by animal interaction within a ration pooled 
across rations, denoted by temp x animal(ration). The residual is the steak error term, but its 
form can be obtained by considering that data from ration 1 and temperature 3°C. The side 
of an animal from ration 1 assigned to 3°C is the blocking factor for the three types of pack¬ 
aging. The steak design is a one-way treatment structure (three levels of packaging) in a 
randomized complete design structure (10 sides of animals assigned to ration 1 stored at 
3°C temperature). The package by animal interaction is the error term for this part of the 
data. The computational form of the residual is package x animal(temp ration). If the data set 
were balanced, there would be 18,18,72 degrees of freedom for the animal error, side error, 
and steak error terms, respectively. 
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The SAS-Mixed code for fitting the split-split-plot model to the shear force data is in 
Table 24.30 where the random statement contains animal(ration ) and temp x animal(ration), 
the whole-plot error and the sub-plot error, respectively. The estimates of the covariance 
parameters are also included in Table 24.30. The tests for the fixed effects are given in 
Table 24.31 where there are significant ration x temperature and ration x packaging interac¬ 
tions. The ration x temperature means are given in Table 24.32 and pairwise comparisons 
among them are given in Table 24.33. The adjust = Tukey option was used on the lsmeans 
statements to provide adjusted p-values from the Tukey multiple comparison method. The 
estimated standard errors for comparing temperatures within a ration are close to 0.45, 
while the estimated standard errors for comparing rations within a temperature are 
around 0.679. Comparing temperatures within a ration are within-animal comparisons 
while comparing rations within a temperature are between animal comparisons and thus 
have the larger estimated standard errors. Tables 24.34 and 24.35 contain the means and 
pairwise comparisons for the ration x packaging means. The estimated standard errors for 
comparing packaging within a ration are about 0.49 while the estimated standard errors 
for comparing rations with a package type are about 0.71. The packaging comparisons are 


TABLE 24.30 

SAS-Mixed Code and Estimates of the Covariance Parameters for Example 24.6 

p r 0 C mixed data=one covtest cl; 
class ration animal temp pack; 
model wbsf=ration|temp|pack/ddfm=kr; 
random animal(ration) temp*animal(ration); 
lsmeans ration*temp/diff adjust=tukey; 
lsmeans ration*pack/diff adjust=tukey; 

Covariance Parameter Estimates 

Standard 


Covariance Parameter 

Estimate 

Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

Animal (ration) 

1.2434 

0.6102 

2.04 

0.0208 

0.05 

0.5740 

4.4215 

Animal x temperature 
(ration) 

0.1992 

0.3612 

0.55 

0.2906 

0.05 

0.03165 

15986 

Residual 

1.7529 

0.3600 

4.87 

<0.0001 

0.05 

1.2166 

2.7440 


TABLE 24.31 

Tests of the Fixed Effects for Example 24.6 


Type III Tests of Fixed Effects 

Effect 

Num df 

Den df 

F-Value 

Pr>F 

Ration 

1 

18 

1.89 

0.1864 

Temperature 

1 

16.7 

1.63 

0.2188 

Ration x temperature 

1 

16.7 

5.47 

0.0321 

Pack 

2 

53.7 

13.80 

<0.0001 

Ration x pack 

2 

53.7 

8.34 

0.0007 

Temperature x pack 

2 

54.6 

2.24 

0.1157 

Ration x temperature x pack 

2 

54.6 

0.53 

0.5899 
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TABLE 24.32 

Ration and Temperature Force Means for Example 24.6 


Ration 

Temperature 

Estimate 

Standard Error 

1 

3 

6.5512 

0.4682 

1 

6 

5.3947 

0.4852 

2 

3 

6.6181 

0.4777 

2 

6 

6.9574 

0.4748 


TABLE 24.33 


Pairwise Comparisons of the Ration by Temperature Means with the Tukey Adjustment for 
Multiple Comparisons 


Ration 

Temperature 

Ration 

Temperature 

Estimate 

Standard Error 

df 

f-Value 

Adj p 

1 

3 

1 

6 

1.1565 

0.4529 

16.5 

2.55 

0.0874 

1 

3 

2 

3 

-0.06693 

0.6689 

27 

-0.10 

0.9996 

1 

3 

2 

6 

-0.4062 

0.6669 

26.7 

-0.61 

0.9278 

1 

6 

2 

3 

-1.2234 

0.6809 

28.5 

-1.80 

0.3091 

1 

6 

2 

6 

-1.5627 

0.6789 

28.2 

-2.30 

0.1374 

2 

3 

2 

6 

-0.3392 

0.4517 

16.9 

-0.75 

0.8750 


TABLE 24.34 

Ration and Packaging Force Means for Example 24.6 

Ration 

Pack 

Estimate 

Standard Error 

1 

1 

5.7044 

0.5158 

1 

2 

6.0996 

0.4927 

1 

3 

6.1149 

0.5166 

2 

1 

5.0572 

0.5014 

2 

2 

6.8470 

0.4928 

2 

3 

8.4591 

0.5357 


made within a side of animal while the ration comparisons are made between animals, 
explaining the differences in magnitude of the two estimated standard errors. 


24.7 Sample Size and Power Considerations 

The sample size needed for a particular study depends on the comparisons that are most 
interesting. The basic sample size equation for detecting a difference of <5 between two 
means where the variance of the difference is 2 a 2 /n, a 2 is the estimate of <7 2 based on v 
degrees of freedom, the type I error rate is a, and the type II error rate is ft is 

2^2 

n = £2 [t a /2,v + h, v ] 
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TABLE 24.35 


Pairwise Comparisons of the Ration by Packaging Force Means with the Tukey 
Adjustment for Multiple Comparisons 


Ration 

Pack 

Ration 

Pack 

Estimate 

Standard Error 

4/ 

t- Value 

Adj p 

1 

1 

1 

2 

-0.3952 

0.4881 

52.2 

-0.81 

0.9645 

1 

1 

1 

3 

-0.4104 

0.5089 

52.1 

-0.81 

0.9651 

1 

1 

2 

1 

0.6472 

0.7193 

34.7 

0.90 

0.9449 

1 

1 

2 

2 

-1.1425 

0.7134 

34 

-1.60 

0.6011 

1 

1 

2 

3 

-2.7547 

0.7437 

38 

-3.70 

0.0063 

1 

2 

1 

3 

-0.01522 

0.4952 

56.1 

-0.03 

1.0000 

1 

2 

2 

1 

1.0424 

0.7030 

33 

1.48 

0.6764 

1 

2 

2 

2 

-0.7473 

0.6969 

32.2 

-1.07 

0.8901 

1 

2 

2 

3 

-2.3595 

0.7279 

36.4 

-3.24 

0.0237 

1 

3 

2 

1 

1.0577 

0.7199 

35.3 

1.47 

0.6849 

1 

3 

2 

2 

-0.7321 

0.7139 

34.6 

-1.03 

0.9073 

1 

3 

2 

3 

-2.3443 

0.7442 

38.7 

-3.15 

0.0303 

2 

1 

2 

2 

-1.7898 

0.4751 

54 

-3.77 

0.0053 

2 

1 

2 

3 

-3.4019 

0.5269 

56.8 

-6.46 

<0.0001 

2 

2 

2 

3 

-1.6122 

0.5083 

50.8 

-3.17 

0.0286 


The sample size equation can be used to evaluate the power of a test by solving for t /iv 
and then determining the power as the value of 1 — /?. The equation for f ftv is 


^p,v y 2(j 2 * “ /2 ' v 

Suppose for Example 24.6 that it is of interest to detect a difference of 1.5 force units 
between ration means within a level of temperature. The parameter of interest 
representing the difference of the two rations at the first jemperature is p n . - ,h 21 .. Jf the 
data set were balanced, the best estimate of p n . - q 21 . is_/ M . -J 2%1 . The model for / t . k . is 
hk- =fiik- + + s uk + e,. k . and the model for / M .is / M .= p u . - p 21 . +d : .-d 2 .+ 

S 1-1 — S 2-1 + £ l-l- — £ 2*1** 

The sample size needed is the number of animals per ration since the number of 
sides and the number of steaks per side are fixed to 2 and 3, respectively The variance 

of/i.i. ~f 2 . i. is 


Var (f uu - f 2 .J= 2g;nimal + + 2t7 '' 


3 n 


= — (<r; 
3« v 


steak 3(T SH | e "E 3(T ilTmna |) 


The number of animals per ration needed to detect a difference of 6 between the two 
means with type I and type II error rates of a and fi, respectively is approximately 


n ~ g (Ofsteak + 3<T si de + 3<7 animal )[f a / 2 ,v + ^y3,v ] 


2 /<s 2 
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where 


^"steak 3CT s , de ^^animal 

is an estimate of of teak + 3of ide + 3cr a 2 nimal based on v degrees of freedom. The expected mean 
squares of the three error terms using a balanced data set are 

E[MSAnimal(Ration)] = cr 2 eak + 3<7 2 de + 6<J a 2 nimal 
E[MSTemp x Animal(Ration)] = cr 2 eak + 3cr 2 de 


and 

E[MSPack x AnimalfTemp*Ration)] = cr 2 teak 
An unbiased estimate of cr 2 teak + 3of ide + 3of nimal is 

1 1 

<r s 2 eak + 3(T 2 de + 3df nimal = — MSAnimal(Ration ) + — .MSTemp x Animal(Ration) 
with approximate degrees of freedom obtained by the Satterthwaite as 

~ = _ (gjeak + 3(7 2 id e + _ 

[l MSAnimal(Ration )] 2 i [\ MSTemp x Animal(Ration)] 2 
18 + 18 


When the data set is not balanced, the approximate degrees of freedom can be evaluated 
using the estimates of the variance components and their corresponding asymptotic 
covariance matrix from REML. Let/(o', 2 , O' 2 ,,,,, crf nimal ) denote the function of the variance 
components that is to be estimated. The estimate of/(erf. O' 2 ,,,,., cr 2 animal ) is /(df, O' 2 ,,,,, df nimal ) 
and the approximate variance is 


Var [/(£/d 2 de ,(T 2 


side f w animal 


)] = 


df df 


df 


do] 3cr 2 de da 2 ^ 


Var [df, 


rr 2 n 2 

^ side r ^ animal 


R_ 

da; 

df 

d°L 

df 


da: 


Evaluate Var[/(df, of ide , d 2 animal )] at the REML estimates of the variance components and 
compute 


Z = 


f ' ^side ' ^"animal ) 


Var [/(<r 2 , <r s 2 ide , <7 2 


imal 
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then the Satterthwaite approximation to the number of degrees of freedom associated with 
Ml ® side' Minimal) ™ V= 2Z 2 . 

For Example 24.6, use the results of the analysis and determine the number of animals 
that are needed to detect a difference of 1.5 shear units with type I and type II error rates 
of 0.05 and 0.05, respectively. The estimates of the variance components from Table 24.30 
are <r 2 nimal = 1.2434, cr 2 ide = 0.1992 and <7 2 teak = 1.7529. The estimate of cr 2 teak + 3<7 2 ide + 
3cr 2 nimai is 6.08086. The estimate of the asymptotic covariance matrix of the variance 
components is in Table 24.36 and is 


____ 

^"animal 


' 0.3723 

-0.06154 

0.002209' 

Var 

rr 2 

^side 

= 

-0.06154 

0.1305 

-0.05370 


^"steak 


0.002209 

-0.05370 

0.1296 


The approximate variance of cr 2 teak + 3<r 2 ide + 3(7 2 nimal is then 


Var(d s 2 teak + 3^ de + 3di imal ) = [3 3 1]V 


3.23805 


and the Z-score is 


Z = 


^"steak ^^"animal 


Var(cr; 


steak ^^"side ^^"animal) 


3.379 


thus there are 2(3.379) 2 = 22.8 degrees of freedom associated with <r 2 teak + 3<r 2 ide + 3<r 2 nimal . 
The estimated sample size is n = |(6.08086) [fo.025,22.8 + f ao5,22.8] 2 /l-5 2 = 25.8 or n = 26 animals 
per ration. 

The power of the study with 10 animals per ration to detect a shear force of 1.5 units 
between rations within a temperature is determined by computing 


h,v - 



10<5 2 

f(d.C 

+ 3d 2 de + MruM 


-t 


a/2,v 


10(1.5) 2 

|(6.08086) 


2.069 = 0.2869 


TABLE 24.36 


Asymptotic Covariance Matrix for the Estimates of the Variance 
Components of Example 24.6 


Asymptotic Covariance Matrix of Estimates 

Row Covariance Parameter CovPl 

CovP2 

CovP3 

1 

Animal (ration) 

0.3723 

-0.06154 

0.002209 

2 

Animal x temperature (ration) 

-0.06154 

0.1305 

-0.05370 

3 

Residual 

0.002209 

-0.05370 

0.1296 
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For a = 0.05, ft = 0.388, providing power = 1-/5= 0.612; that is, with 10 animals per ration 
there is 61% chance of determining that two ration means are significantly different when 
they actually differ by 1.5 units. 

The second sample size problem is to determine the number of animals required to 
detect a difference of 1.5 shear force units between two temperature means at the same 
level of packaging averaged over the two levels of ration. The comparison of interest is 
p.ii - ju. 21 - The model needed to evaluate the variance of the difference is f.. M = jj. kl + a.. + 
s.. k + e..j k . The variance of the difference is 


Var(/..n - /.. 21 ) = Var(s ml - s.. 2 + £.. n - e.. 21 ) 

_ ^^~side _|_ ^^~steak 

2 n 2n 

2 2 2 
2^2 (^"side ^"steak ) 


The number of animals per ration is n = f(d^ teak + a s ide )[f 0 . 025/49 . 9 + fo.o5.49.9p/l-5 2 = U-8 where 
the number of degrees of freedom was determined to be 49.9 using the procedure outlined 
above. 

There are several comparisons that could be important, so sample sizes need to be deter¬ 
mined for each comparison and the number of animals or number of whole plots needed 
can be based on the maximum number of animals required for each of the comparisons 
deemed to be important. 


24.8 Computations Using JMP—Example 24.7 

The data set in Figures 24.9 and 24.10 are from a JMP table with nominal variables, block, 
variety, fert, and rate. The design of the experiment consists of four blocks of four whole 
plots where the combinations of two varieties of wheat and two levels of fertilizer are ran¬ 
domly assigned to the whole plots. Each whole plot is split into three subplots and the 
three levels of seeding rate were randomly assigned to the three subplots. The whole plot 
design is a two-way treatment structure (varieties by levels of fertilizer) in a randomized 
complete block design structure (four blocks). There are some missing data points due to 
uncontrollable insect damage, but the following evaluation of the structures are carried 
out assuming there are no missing data points. The whole plot design analysis of variance 
table is given in Table 24.37. The whole-plot error is computed from the treatment structure 
by design structure interaction. In this case, there are is a two-way treatment structure 
with four treatment combinations, thus the whole plot error is made up of (4 - 1)(4 - 1) = 9 
degrees of freedom. Next, compare the levels of seeding rates at variety one and fertilizer 1. 
The resulting design is a one-way treatment structure in a randomized complete block 
design structure and the analysis of variance table is given in Table 24.38. The subplot error 
degrees of freedom are computed from the block by rate interaction, providing 
6 degrees of freedom. There are four combinations of variety and fertilizer where each 
provides 6 degrees of freedom for the subplot error. Pooling these four sets of 6 degrees of 
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S iexample_24_7 

S j Columns (5/0) 

O'- 

© 

block 

Variety 

fert 

rate 

yield 


1 

1 

1 

1 

4 

23.3 


2 

1 

1 

1 

6 

27.7 


3 

1 

1 

1 

10 

■ 


4 

1 

1 

2 

4 

242 


5 

1 

1 

2 

6 

31 


6 

1 

1 

2 

10 

31.4 


7 

1 

2 

1 

4 

25.9 


8 

1 

2 

1 

6 

26.7 


9 

1 

2 

1 

10 

■ 


rib block 
^ Variety 
^ fert 
d. rate 

A yield 

10 

1 

2 

2 

4 

31.3 


11 

1 

2 

2 

6 

32.4 


12 

1 

2 

2 

10 

37.2 


13 

2 

1 

1 

4 

■ 


14 

2 

1 

1 

6 

19.3 


15 

2 

1 

1 

10 

17.2 


16 

2 

1 

2 

4 

13.7 


17 

2 

1 

2 

6 

■ 


18 

2 

1 

2 

10 

19.8 


19 

2 

2 

1 

4 

7.7 


20 

2 

2 

1 

6 

13.4 


21 

2 

2 

1 

10 

10.2 


©Rows 

All rows 48 

Selected 0 

Excluded 0 

Hidden 0 

Labelled 0 

22 

2 

2 

2 

4 

20.3 


23 

2 

2 

2 

6 

16.2 


24 

2 

2 

2 

10 

26.3 


25 

3 

1 

1 

4 

24.9 


26 

3 

1 

1 

6 

30.3 


27 

3 

1 

1 

10 

30.6 


28 

3 

1 

2 

4 

23.5 


29 

3 

1 

2 

6 

28.1 


30 

3 

1 

2 

10 

28.9 


31 

3 

2 

1 

4 

24.2 


32 

3 

2 

1 

6 

32.9 



FIGURE 24.9 Data set with first 32 observations for Example 24.7. 


freedom provides (4)(6) = 24 degrees of freedom for the subplot error. A model that can be 
used to describe this data is 


y ijn - ikji + K + zv ijk + £ ijki 


i = 1, 2, j = 1,2, k = 


1, 2, 3,4, / = 1,2, 3 


where b k ~ i.i.d. N( 0, <J 2 hlk ), w ijk ~ i.i.d. N(0, o\, p ), e ijld ~ i.i.d. N( 0, <J 2 p ), and p pl denotes the mean 
of variety i, fertilizer j and seeding rate l. 
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A yield 

33 

3 

2 

1 

10 

33.3 



34 

3 

2 

2 

4 

. 


35 

3 

2 

2 

6 

23.8 


36 

3 

2 

2 

10 

■ 


37 

4 

1 

1 

4 

32.2 


38 

4 

1 

1 

6 

37.7 


39 

4 

1 

1 

10 

• 


40 

4 

1 

2 

4 

35.5 


i? Rows 

41 

4 

1 

2 

6 

■ 


All rows 48 

Selected 0 

Excluded 0 

Hidden 0 

Labelled 0 

42 

4 

1 

2 

10 

41.4 


43 

4 

2 

1 

4 

36 


44 

4 

2 

1 

6 

42.1 


45 

4 

2 

1 

10 

42.7 


46 

4 

2 

2 

4 

31.4 


47 

4 

2 

2 

6 

28 


48 

4 

2 

2 

10 

32.9 










FIGURE 24.10 Data set with last 16 observations for Example 24.7. 


TABLE 24.37 

Whole-Plot Analysis for Example 24.7 


Source 

df 

Block 

3 

Variety 

1 

Fertilizer 

1 

Variety x fertilizer 

1 

Error(whole-plot) 

9 


TABLE 24.38 


Analysis of Variance for Comparing 
Rates at Variety 1 and Fertilizer 1 


Source 

df 

Block 

3 

Rate 

2 

Error(subplot) 

6 


The analysis of variance table with the two error terms and the expected mean squares is 
displayed in Table 24.39. The JMP fit model screen is presented in Figure 24.11. The block 
and block x variety xfert terms have been declared to be random effects by using the attri¬ 
butes button. The default method of estimation is REML, but the type III sums of squares 
can be used by changing the method button. Click the run model button to obtain the 
analysis. Figure 24.12 contains the estimates of the variance components and the analysis 
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TABLE 24.39 

Analysis of Variance Table for Example 24.7 Assuming No Missing Data 


Source 

df 

Expected Mean Squares 

Block 

3 

o% + 3 (7^, + 12cr^ 

Variety 

1 

0% + 3(7^ + 0 2 (V) 

Fertilizer 

1 

0 % + 3ol p + f- (F) 

Variety x fertilizer 

1 

0% + 3oip + <t> 2 (V x F) 

Error(whole-plot) = variety x fertilizer x block 

9 

<y sv + ^^wp 

Rate 

2 

0% + 0 2 (R) 

Rate x variety 

2 

<5% + <t> 2 {Vx R) 

Rate x fertilizer 

2 

o 2 sp + 0 2 (F x R) 

Rate x variety x fertilizer 

2 

+ 0 2 (V x F x R) 

Error(subplot ) = rate x block{variety x fertilizer) 

24 



▼ - Model Specification 


Select Columns 
|L block 


Variety 


■L fert 
ll.rate 
A yield 


Pick Role Variables 


Y 

Jk yield 


optional 

Weight 

\ optional Numeric 


Freq ] [ optional Numeric 


By ] | optional 


Personality: 

1 1 

Standard Least Squares v 

Emphasis: 

Effect Leverage v 

Method: 

REML (Recommended) v 


0 Unbounded Variance Components 
I I Estimate Only Variance Components 


Help 


Run Model 


[ Remove 


Construct Model Effects 


Add 



Variety 

Cross 

fert 

Nest 

Variety’fert 

block*Variety*fert& Random 

Macros v 

rate 

Degree [j] 

ferbrate 

Variety’rate 

Attributes r' 

Variety'fert'rate 

Transform r 1 


1 1 No Intercept 



FIGURE 24.11 Fit model table for Example 24.7. 


of the fixed effects. The confidence intervals about the variance components are computed 
using the Wald interval instead of a Satterthwaite approximation interval, as done by 
SAS-Mixed. All of the effects involving rate are significant, including the three-way inter¬ 
action, thus one needs to address the variety xfert x rate three-way means. Clicking on the 
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▼ REML Variance Component Estimates 


Random Effect 
block 

block*Variety*fert 

Residual 

Total 


Var Ratio 

Var Component 

Std Error 

95% Lower 

95% Upper 

Pet of Total 

23.119242 

64.838059 

56.44131 

-45.78691 

175.46303 

77.805 

5.5949103 

15.690961 

7.8980547 

0.2107738 

31.171148 

18.829 


2.8045063 

83.333527 

0.9820348 

1.5631132 

6.4330899 

3.365 

100.000 


-2 LogLikelihood = 187.42860736 

► Iterations 

* Fixed Effect Tests 


Source 

Nparm 

DF 

DFDen 

F Ratio 

Prob > F 

Variety 

1 

1 

9.313 

0.0030 

0.9577 

fert 

1 

1 

9.308 

0.3425 

0.5723 

Variety'fert 

1 

1 

9.304 

0.0054 

0.9431 

rate 

2 

2 

16.6 

27.9065 

<0001* 

fert'rate 

2 

2 

16.57 

4.9021 

0.0213* 

Variety*rate 

2 

2 

16.66 

4.6008 

0.0257* 

Variety*fert*rate 

2 

2 

16.66 

4.1246 

0.0351* 


FIGURE 24.12 Estimates of the variance components (REML) and tests of the fixed effects for Example 24.7. 


▼| T Variety 


▼ Least Squares Means Table 


Level 

Least Sq Mean 

Std Error 

1 

27.434210 

4.2835179 

2 

27.546696 

4.2819455 

▼(®fert 

▼ Least Squares Means Table 

Level 

Least Sq Mean 

Std Error 

1 

26.886153 

4.2812613 

2 

28.094752 

4.2841854 


▼| r Variety'fert 


▼ Least Squares Means Table 


Level 

Least Sq Mean 

Std Error 

1,1 

26.754198 

4.5269151 

1,2 

28.114222 

4.5257066 

2,1 

27.018109 

4.5171506 

2,2 

28.075282 

4.5295541 


FIGURE 24.13 Least squares means for varieties, fertilizers and their interaction for Example 24.7. 


effects button provides the tables of least squares means for each effect, as shown in Figures 
24.13—24.15. The least square means options are displayed in Figure 24.16. The tables 
option is the default. The plots option provides a line plot of the means. The contrast option 
provides the ability to construct contrasts of the least squares means that are of interest. 
The Tukey HSD option provides the Tukey adjustment for multiple comparisons, and the 
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rate 

▼ Least Squaies Means Table 

Level 

Least Sq Mean 

Std Error 

4 

24.600850 

4.1734394 

6 

28.091917 

4.1750166 

10 

29.778591 

4.1816862 

▼ ®fert*rate 

▼ Least Squares Means Table 

Level 

Least Sq Mean 

Std Error 

1,4 

23.424406 

4.3146250 

1,6 

28.762500 

4.3036552 

1,10 

28.471554 

4.3465763 

2,4 

25.777295 

4.3168902 

2,6 

27.421334 

4.3338869 

2,10 

31.085628 

4.3168902 

▼ ® Variety'rate 

▼ Least Squares Means Table 

Level 

Least Sq Mean 

Std Error 

1,4 

23.811906 

4.3146250 

1,6 

29.246334 

4.3338869 

1,10 

29.244391 

4.3364808 

2,4 

25.389795 

4.3168902 

2,6 

26.937500 

4.3036552 

2,10 

30.312792 

4.3269197 


FIGURE 24.14 Least squares means for rate and its interactions with variety and fertlizer. 


▼ - Variety‘feiterate 


▼ Least Squares Means Table 


Level 

Least Sq Mean 

Std Error 

1,1,4 

23.398811 

4.6055992 

1,1,6 

28.750000 

4.5643599 

1,1,10 

28.113782 

4.6869873 

1,2,4 

24.225000 

4.5643599 

1,2,6 

29.742667 

4.6773808 

1,2,10 

30.375000 

4.5643599 

2,1,4 

23.450000 

4.5643599 

2,1,6 

28.775000 

4.5643599 

2,1,10 

28.829326 

4.6023424 

2,2,4 

27.329590 

4.6140820 

2,2,6 

25.100000 

4.5643599 

2,2,10 

31.796257 

4.6140820 


FIGURE 24.15 Least squares means for variety by fertilizer by rate interaction for Example 24.7. 
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0 LSMeans Table 

1 1 LSMeans Plot 

1 1 LSMeans Contrast 

1 1 LSMeans Student's t 

□ LSMeans Tukey HSD 

□ Test Slices 

Power Analysis 

OK 

Cancel 

A 


FIGURE 24.16 Least squares means options of an effect. 


Level 

Least Sq Mean 

2,2,10 A C D E G H 

31.796257 

1,2,10 A B DE H 

30.375000 

1,2,6 ABCDEFGH 

29.742667 

2,1,10 A BCD F 

28.829326 

2,1,6 A B C D F 

28.775000 

1,1,6 A B C E F G 

28.750000 

1,1,10 A B C D E F G H 

28.113782 

2,2,4 ABCDEFGH 

27.329590 

2,2,6 B F 

25.100000 

1,2,4 C F G 

24.225000 

2,1,4 E G H 

23.450000 

1,1,4 D H 

23.398811 

Levels not connected by same letter are significantly different. 


FIGURE 24.17 Least squares means for the three way interaction letters denoting significant groupings via 
Tukey's method. 


slice option provides a test of the equality of means for each level of each effect in an inter¬ 
action. The results in Figure 24.17 are part of the Tukey HSD option results with lines or 
letters provided. Since there are missing data in the study the letters may result in some 
contradictions. For example, mean 2,2,10 is significantly different from mean 2,2,6 (see A 
and B) but 2,2,10 is not significantly different from 1,2,4 (see C) even though the mean for 
2, 2, 6 is larger than the mean for 1, 2, 4. Missing data cause the variances of comparisons 
to be different, which in turn can provide multiple comparisons that are not completely 
ordered as demonstrated above. 

Figure 24.18 contains the complete set of results of applying the Tukey HSD option to the 
Variety x Pert means. The table contains the estimated differences, the estimated standard 
errors of each difference and upper and lower 95% confidence intervals about each differ¬ 
ence. When a confidence interval does not include zero, the results are displayed in red. 
All of these confidence intervals include zero, so they are displayed in black. Contrasts of 
the least squares means can be evaluated by selecting coefficients for each mean on the 
contrast table displayed in Figure 24.19. 
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v LSMeans Differences Tukey HSD 

a= 0.050 


LSMean(j] 


Mean[i]-Mean[j] 

Stef Err Dif 

Lower CL Dif 
Upper CL Dif 

1,1 

1.2 

2,1 

2,2 

1,1 

0 

-1.36 

-0.2639 

-1.3211 


0 

2.92504 

2.91177 

2.93117 


0 

-10.418 

-9.3109 

-10.388 


0 

7.69757 

8.7831 

7.74557 

1,2 

1.36002 

0 

1.09611 

0.03894 


2.92504 

0 

2.90996 

2.9293 


-7.6976 

0 

-7.9498 

-9.0271 


10.4176 

0 

10.142 

9.10499 

2.1 

0.26391 

-1.0961 

0 

-1.0572 


2.91177 

2.90996 

0 

2.91595 


-8.7831 

-10.142 

0 

-10.113 


9.31092 

7.94977 

0 

7.99889 

2.2 

1.32108 

-0.0389 

1.05717 

0 


2.93117 

2.9293 

2.91595 

0 


-7.7456 

-9.105 

-7 9989 

0 


10.3877 

9.02711 

10.1132 

0 


Level Least Sq Mean 

1.2 A 28.114222 

2.2 A 28.075282 

2.1 A 27.018109 

1.1 A 26.754198 

Levels not connected by same letter are significantly different. 


FIGURE 24.18 Computations for Tukey's method of multiple comparisons for the variety by fertilizer means 
for Example 24.7. 


This example shows some of the options that are available for analyzing split-plot type 
designs using the JMP software. All of the data sets and models described in this chapter 
can be analyzed using the fit model screen of JMP. 


24.9 Concluding Remarks 

In this chapter, split-plot designs were discussed in detail with the analyses being carried 
out using SAS-Mixed and JMP. The analysis of variance tables for the designs discussed 
involve more than one error term and methods are described that enable one to construct 
the appropriate representation of each error term. Estimated standard errors of various 
estimates can involve more than one error term. Methods were presented for computing 
estimated standard errors for the various types of comparisons that may be interesting to 
an experimenter. Estimated standard errors can also be used for carrying out multiple 
comparisons. Methods are described for estimating linear, quadratic, and similar trends 
when the levels of a factor are quantitative. Sample size and power methods are demon¬ 
strated. Seven examples were discussed in detail. 
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| (y Contrast 

Contrast Specification 
Variety Mert'rate 

1.1.4 

1,1,6 

1 , 1,10 

1.2.4 

1,2,6 

1,2,10 

2.1.4 

2,1,6 

2 , 1,10 

2.2.4 

2 , 2,6 

2 , 2,10 


Click on + or - to make contrast values. 
| New Column [ Done ] | Help ] 


FIGURE 24.19 Contrast option window for least squares means. 


24.10 Exercises 

24.1 A baker wanted to determine the effect that the amount of fat in a recipe of 
cookie dough would have on the texture of the cookie. She also wanted to deter¬ 
mine if the temperature (°F) at which the cookies were baked would have an 
influence on the texture of the surface. The texture of the cookie is measured by 
determining the amount of force (g) required to penetrate the cookie surface. 
The process she used was to make a batch of cookie dough for each of the four 
recipes and bake one cookie from each batch in the oven at one time. She carried 
this process out each of four days when she baked cookies at three different tem¬ 
peratures. The following are the forces required to penetrate the cookie surface 


Data for Exercise 24.1. Data Are Force (g) to Penetrate a Cookie Surface 


Day 

Temperature 

2% Fat 

4% Fat 

6% Fat 

8% Fat 

1 

350 

7.4 

7.1 

7.2 

6.7 

1 

375 

11.2 

11.1 

10.6 

10.8 

1 

400 

11.8 

11.2 

11.3 

11.4 

2 

350 

8.7 

8.1 

7.4 

7.2 

2 

375 

10.8 

10.1 

10.0 

10.2 

2 

400 

12.9 

14.0 

13.6 

12.7 

3 

350 

7.0 

6.4 

6.2 

6.1 

3 

375 

8.3 

7.7 

7.4 

8.4 

3 

400 

12.1 

12.1 

12.2 

12.0 

4 

350 

6.6 

6.7 

6.9 

6.0 

4 

375 

9.1 

8.1 

8.3 

8.0 

4 

400 

11.7 

12.1 

11.9 

11.4 
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1) Key out the analysis of variance table of this study. 

2) Carry out an analysis of variance and determine the important fixed effects 
in the model. 

3) Complete the analysis by carrying out multiple comparisons on the impor¬ 
tant effects in the model. 

4) Fit a response surface model and determine a good model to predict force as 
a function of percentage fat and temperature. 

5) Determine the number of times (days) she would have to repeat this 
experiment so that she could detect a difference of one unit in force in the 
temperature means within a given recipe using the estimate of the variance 
components from this data and type I and type II errors of 0.05 and 0.20, 
respectively. 

24.2 An agriculture engineer studied the amount of water that would flow through a 
particular type of nozzle at a fixed pressure in the line to which the nozzle was 
attached. Three different nozzle types were used where one had a round hole, 
one an oval hole and the other a narrow slit. The areas of the openings were 
equal in size. The study involved setting the pressure in the system to a fixed 
value and then attaching each nozzle to the line (in a random order). The amount 
of water that flowed through the nozzle in 20 s was measured (ounces). The 
pressure was selected at random (10, 20, 40, and 80 psi) and all four pressures 
were evaluated. The whole process was repeated three times (blocks). The data 
are given in the following table. 


Data for Exercise 24.2 with Three Nozzle Types 
(R, O, and S) and Ounces of Water per Trial 


Block 

Pressure 

R 

O 

S 

1 

10 

4.4 

4.5 

3.2 

1 

20 

5.7 

5.0 

3.9 

1 

40 

7.5 

6.2 

5.2 

1 

80 

10.7 

11.0 

9.9 

2 

10 

6.1 

6.7 

6.5 

2 

20 

9.4 

7.4 

8.2 

2 

40 

9.6 

7.3 

6.3 

2 

80 

14.6 

13.2 

12.1 

3 

10 

2.4 

3.0 

1.6 

3 

20 

3.4 

2.5 

2.3 

3 

40 

5.0 

2.3 

1.9 

3 

80 

11.5 

11.5 

10.2 


1) Key out the analysis of variance table for this study. 

2) Carry out an analysis of variance and evaluate the presence of interaction 
among the levels of pressure and the three nozzle types. 

3) Determine if there is a linear or quadratic effect of pressure for each of the 
three nozzle types. 

24.3 The agricultural engineer who evaluated the nozzles in Exercise 24.2 decided 
that more data were needed and ran some additional trials. In this study he used 
three pressures, 40, 60, and 80 psi, as well as the three shapes of holes in the 
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nozzles, R, O, and S. The process for this study was to place a nozzle type on the 
line and then determine the amount of water through the nozzle at each of the 
three pressures where the order of the pressures was selected at random for each 
nozzle type. He repeated the study for two blocks. The amount of water in the 
20 s time interval was recorded as in the following table. 


Data for Exercise 24.3 with Three Nozzle Types (R, O, 
and S) and Three Line Pressures (P-40, P-60, and 
P-80). The Data Are in Ounces of Water per Trial 


Block 

Nozzle 

P-40 

P-60 

P-80 

1 

O 

6.9 

8.7 

12.2 

1 

R 

9.7 

11.1 

12.6 

1 

S 

6.7 

8.6 

12.5 

2 

o 

8.0 

9.7 

12.8 

2 

R 

9.2 

11.9 

12.3 

2 

S 

7.1 

9.1 

13.4 

3 

o 

6.7 

8.1 

11.6 

3 

R 

8.1 

10.0 

11.8 

3 

S 

4.0 

5.6 

9.5 


1) Key out the analysis of variance table for this study. 

2) Carry out an analysis of variance and evaluate the presence of interaction 
among the levels of pressure and the three nozzle types. 

3) Determine if there is a linear or quadratic effect of pressure for each of the 
three nozzle types. 

24.4 Carry out a combined analysis of the data in Exercises 24.2 and 24.3. Discuss the 
method of analysis and determine if there is a linear or quadratic effect of pres¬ 
sure on the nozzle type. 

24.5 An enhanced after-school exercise program was used to see if the participating 
children would do more moderate or better exercise than those students in a 
regular after-school program. The students of interest are those classified as 
coming from low-income families. Ten schools were selected, six large and four 
small, and three of the large and two of the small schools were randomly selected 
to receive the enhanced exercise after-school program. The other five schools 
carried out their regular after-school program. Both female and male students 
were from either the second, third, or fourth grades. Schools numbered 1-3 and 


Data for Exercise 24.5 for Female Students Using the Regular Exercise Program 


School 1 

School 2 

School 3 


School 4 


School 5 

cl4 

cl2 

cl3 

cl4 

cl2 

cl3 

cl4 

cl2 

cl3 

cl4 

cl2 

cl3 

cl4 

cl2 

cl3 

19 

15 

18 

18 

18 

17 

21 

17 

17 

13 

14 

11 

13 

20 

14 

18 

18 

17 

18 

17 

17 

16 

15 

18 

11 

15 

13 

13 

17 

16 

18 

18 

17 

15 

20 

14 

18 

— 

13 

13 

— 

12 

13 

— 

18 

18 

15 

— 

18 

14 

— 

— 

— 

12 

10 

— 

12 

14 

— 

16 

14 

15 

— 

— 

— 

— 

— 

— 

12 

13 

— 

12 

— 

— 

— 

19 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

13 

— 

— 

— 
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Data for Exercise 24.5 for Male Students Using the Regular Exercise Program 


School 1 

School 2 

School 3 

School 4 

School 5 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

12 

20 

16 

10 

16 

13 

14 

16 

13 

9 

14 

11 

12 

15 

17 

13 

18 

14 

14 

16 

11 

11 

16 

11 

10 

14 

10 

10 

17 

14 

16 

13 

12 

13 

15 

16 

— 

13 

10 

8 

11 

12 

12 

18 

16 

14 

20 

— 

12 

17 

— 

— 

14 

12 

9 

16 

9 

— 

— 

— 

— 

— 

— 

13 

— 

— 

— 

12 

10 

7 

13 

10 

— 

— 

— 

— 

— 

— 

11 

— 

— 

— 

— 

— 

6 

14 

— 

— 

— 

— 


Data for Exercise 24.5 for Female Students Using the Enhanced Exercise Program 


School 6 

School 7 


School 8 


School 9 

School 10 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

22 

17 

18 

25 

17 

22 

20 

20 

24 

22 

17 

24 

17 

18 

17 

21 

21 

16 

21 

22 

22 

21 

24 

19 

22 

21 

20 

22 

19 

20 

18 

23 

19 

21 

20 

22 

17 

— 

22 

18 

18 

— 

21 

22 

22 

— 

23 

19 

26 

20 

18 

— 

— 

22 

25 

20 

— 

— 

— 

— 

— 

— 

21 

20 

— 

20 

— 

— 

24 

21 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

23 

— 

— 

— 

19 

— 

— 

— 

— 

— 


Data for Exercise 24.5 for Male Students Using the Enhanced Exercise Program 


School 6 

School 7 


School 8 

School 9 

School 10 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

cl4 

16 

19 

18 

13 

12 

15 

13 

14 

22 

19 

13 

15 

12 

17 

16 

12 

17 

13 

16 

14 

13 

13 

19 

19 

14 

16 

16 

8 

15 

15 

13 

20 

10 

12 

18 

15 

— 

18 

16 

— 

15 

19 

— 

20 

— 

16 

— 

— 

16 

15 

18 

— 

20 

— 

— 

— 

14 

— 

16 

— 

14 

— 

— 

— 

— 

18 

— 

16 

— 

— 

— 

17 

— 

19 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

15 

— 

19 

— 


6-8 denote the large schools and schools numbered 4-5 and 9-10 denote the 

small schools. The data are given above. 

1) Assuming the data set is such that there are six males and six females per 
class, key out the analysis of variance table and include the numbers of 
degrees of freedom. 

2) For the analysis of variance table in part (1) determine the computational 
representation of the three error terms. 

3) Carry out an analysis of variance using the data set and include all needed 
comparisons where the Tukey adjustment is used to account for multiple 
comparisons. 







Methods for Analyzing Strip-Plot Type Designs 


Strip-plot design structures are used in situations where the experimental units are 
arranged in rectangles, such as in a field, on a piece of fabric, or with a stack of cages. The 
strip-plot design was described in Chapter 5 as one of the basic design structures and the 
details of the analysis are discussed in this chapter. A strip-plot design structure occurs 
when the levels of one factor are applied to the rows of each rectangle and the levels of 
another factor are applied to the columns of each rectangle. Thus, this design is useful 
when the method of application of treatments does not allow the experimental units to be 
treated individually but rather in groups such as a whole row of units or a whole column 
of units. The strip-plot design structure is also useful for experiments where the experi¬ 
mental units are processed in a series of steps where groups of experimental units are 
treated together within each step. Six examples are discussed to demonstrate the analyses 
of complex design structures. The first example involving levels of both irrigation and 
nitrogen is used to demonstrate the basic analysis of the simplest strip-plot design. Most of 
the remaining examples involve various combinations of strip-plot and split-plot design 
structures. The last example involves a strip-strip-plot design structure. 


25.1 Description of the Strip-Plot Design and Model 

A strip-plot design structure is similar to a split-plot design structure, but the experimen¬ 
tal units are constructed differently. The strip-plot involves at least a two-way treatment 
structure where the basic experimental units are arranged in sets of rectangles. Each rect¬ 
angle has a rows where a is the number of levels of the first factor (A) and c columns where 
c is the number of levels of the second factor (C). The levels of factor A are randomly 
assigned to rows of the rectangle where all of the experimental units within a row receive 
the same level of A and are treated together. The levels of factor C are randomly assigned 
to the columns of the rectangle so that all of the experimental units within a column receive 
the same level of C and are treated together. A schematic showing the assignment of the 
levels of A to the rows and the levels of C to the columns is shown in Figure 25.1, where 
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Cl ci Cc 



for levels of A 

FIGURE 25.1 The levels of two factors are assigned to strips (row and columns) of experimental units arranged 
in a rectangle. 

there are r such rectangles representing blocked replicates. This assignment process 
generates a design with three sizes of experimental units. The rows are the experimental 
units for factor A, the columns are the experimental units for factor C, and cells are the 
experimental units for the A x C interaction. 

An easy way to visualize the appropriate analysis is to first ignore the levels of factor C 
and just look at the rectangles with the levels of A assigned to the rows, as displayed in 
Figure 25.2. The row design consists of a one-way treatment structure in a randomized 


Block 1 Block 2 




FIGURE 25.2 Assignment of the levels of A to the rows of each rectangle. 
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complete block design structure. A model that can be used to describe data from the row 
design is 


y ik = p\ + b[ + e \ k , i = 1,2,... ,a, fc = l,2,...,r 

where the block effects are b[ - i.i.d. N(0, <7^ lock ) and the row effects are e r ik ~ i.i.d. N( 0, <7? 0W ). 

The analysis of variance table for the row design is given in Table 25.1 where the row 
error term is computed as the rectangle x A interaction. 

Next, ignore the levels of factor A and look at the rectangles with the levels of C assigned 
to the columns, as displayed in Figure 25.3. The column design consists of a one-way treat¬ 
ment structure in a randomized complete block design structure. A model that can be 
used to describe data from the column design is 

y> k = ^ + K + e- k *' = 1,2. C,k = 1,2.r 


TABLE 25.1 


Analysis of Variance for the Row Part of 
the Strip-Plot Design Structure 

Source 

df 

Block 

r-1 

A 

a - 1 

Error{row) = A x block 

(a - l)(r - 1) 


Block 1 Block 2 

ci a C3 ... Cc ci ci a ... Cc 



Block 3 Block r 

Cl Cl Cl ... Cc Cl Cl C3 ... Cc 



FIGURE 25.3 Assignment of the levels of C to the rows of each rectangle. 
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where the block effects are b‘ k ~ i.i.d. N( 0, O'clock) an d the column effects are e% ~ i.i.d. 
N(0, <7 2 o|umn ). The analysis of variance table for the column design is given in Table 25.2, 
where the column error term is computed as the rectangle x C interaction. Comparisons of the 
levels of factor A are between-row comparisons, those of factor C are between-column com¬ 
parisons, but measures of A xC interaction are within-row comparisons by within-column 
comparisons. Thus the size of the experimental unit on which the A x C interaction 
is measured is the intersection of a row and a column which corresponds to a cell within 
the rectangle. 

A model that describes the strip-plot design structure with two factors in the treatment 
structure is 


y ijk = p + b k + a, + r ik + y ) ■ + c jk + (ay)„ + e ijk , i = 1 , 2 ,..., a, j = 1 , 2 ,..., c, k = 1 , 2 ,..., r 

where p denotes the overall mean effect, a, denotes the effect of the zth level of factor 
A, Yj denotes the /th level of factor C, («/)„ denotes the interaction effect between the levels 
of factors A and C, b k denotes the effect of the kth block assumed to be distributed b k ~ i.i.d. 
N( 0, O'clock)' r ik denotes the effect of the zth row within the kth block assumed to be distrib¬ 
uted r ik ~ i.i.d. N( 0, 07 ow ), c jk denotes the effect of the/th column within the kth block assumed 
to be distributed c jk ~ i.i.d. N( 0, cr 2 olumn ), and e i]k denotes the random effect associated with 
the z/th cell within the kth block assumed to be distributed as s ijk ~ i.i.d. N( 0, cr 2 en ). In addi¬ 
tion, all random terms are assumed to be independently distributed. 

The row error is computed as the A by block interaction, the column error term is 
computed as the C by block interaction and the cell error term is computed as the A by C by 
block three-way interaction. The analysis of variance table is shown in Table 25.3, which is 


TABLE 25.2 


Analysis of Variance for the Column Part 
of the Strip-Plot Design Structure 


Source 

df 

Block 

r-1 

C 

c -1 

Error(column) = C x Block 

(c-l)(r-l) 


TABLE 25.3 


Analysis of Variance Table for the Strip-Plot Design Structure with a 
Two-Way Treatment Structure 


Source 

df 

EMS 

Block 

r-1 

°cell + c °row + <Z<V olumn + flc 0block 

A 

a - 1 

c^eii + c °Tow + <p 2 (a) 

Error(row) 

(«-l)(r-l) 

<7 C ell c °io W 

C 

c -1 

el,l + aff column+<P 2 M 

Erroricolumn) 

(c-l)(r-1) 

•Tell + ' 7< Tulumn 

AC 

(a - l)(c - 1) 

+ <P 2 (ay) 

Error(cell) 

(a - l)(c - l)(r - 1) 

O’cell 
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constructed by taking a combination of the entries from Tables 25.1 and 25.2 with the A by C 
interaction and the cell error. The strip-plot model can be expressed as a means model as 

y i jk = Pij + K + r ik + c jk + £ ijk/ i = 1,2,..., a, /' = l,2,...,c, k = l,2,...,r 
where p tj = p + a, + y f + (ay),,. 

For a balanced data set, the estimates of the cell means are //,, = y, h and various contrasts, 
sums or means involving the q,, are estimated by taking the corresponding contrasts, sums 
or means of the p, r The method of moments solution for the variance components is 
obtained from the equations generated by equating the mean squares to the expected mean 
squares in Table 25.3 giving 


a 


2 

row 


^"column 

~2 

^ block 


= MSError(cell) 

_ MSError{row) - MSError(cell) 
c 

_ MSError(colnmn) - MSError(cell) 
a 

_ MSBlock-MSError(column)-MSError(row) + MSError(cell) 

ac 


The estimates of the variance components are 



l f <7ro„ >0 
if dr* < 0 


if ^column >0 

if ^column £ 0 


and 


^"block 



if^biock>0 
if <7biock ^ 0 


Care must be taken when making inferences about the parameters for this model since 
the model does involve three error terms, one for each size of experimental unit. The 
necessary tests of hypotheses and appropriate standard errors for making comparisons 
among the means are discussed in the next section. 


25.2 Techniques for Making Inferences 

The column of expected mean squares in Table 25.3 provides the information necessary to 
construct test statistics to make inferences about the factorial effects. To test the no-interaction 
hypothesis H 0AC : p,, - p^ - p rj + p t ... = 0 for all i ^ T and jl=j' vs H aAC : (not H 0AC ), divide the 
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mean square AxC by the MSError(cell). To test the equal main effect means hypothesis for 
the levels of A, H 0A : jj 4 . = fi 2 - = • • • = /X,. vs H aA : ( not H 0A ), divide the mean square for factor A 
by MSError(row). To test the equal means hypothesis for the levels of C, H oc : /!., = p. 2 = ■■■ = p. c 
vs H aC : ( not H oc ), divide the mean square for factor C by MSErr or (column). 

As for the split-plot design discussed in Chapter 24, where there were four comparisons 
of the means with different standard errors, there are six comparisons of the means with dif¬ 
ferent standard errors for the strip-plot design. The comparisons, estimators of the com¬ 
parisons based on a balanced data set and the variances of the comparisons are given in 
Table 25.4. The method described in Chapter 24 was used to obtain the specific variances 
for each contrast of the cell means. To demonstrate, the variance of the difference of two 
means where the levels of A and C are different for both means, such as ju 12 - ju 34 , is derived 
by first obtaining the model representation for /},, = y f , as /}„ = q, f + b. + F„ + c ; . + Then the 
model for p 12 - p 3l is 


fi 12 P'34 I! 12* I! 34- Ml2 P-3i + r l- T 3- + C 2- C 4- + £ 12- £ 34- 


The estimate of p 12 - p 34 is y 12 . - y 34 . and the variance of p 12 - p 34 is 
Var(/i 12 — /i 34 ) = Var (r lm -r 3 . + c 2 . -c 4 . +£ n .-e 34 .) 

_ ^^row | ^^column | ^^cell 

r r r 

_ + ^column + g row ) 

r 


The estimate of the variance of p 12 - jl 34 is 


Var(/i 12 - ju 34 ) = 2 ( ( 7 cell+ ^column + ^row ) 

r 

When &row > 0 and <7^ olumn > 0 the estimate of the variance is 

Var (ji 12 - p 34 ) = [aMSError(rozv) + cMSError(column ) + (ac-a — c)MSError(cell)] 

The six types of comparisons, their estimators and their corresponding variances are given 
in Table 25.4. The estimates of the standard errors for the comparisons are given in Table 25.5 

TABLE 25.4 


Comparisons, Estimators, and the Variance of the Estimators 
for the Strip-Plot Design for i ^ m and j -t n 


Comparison 


Estimator 

Variance 

Pi.-Pm. 

Vi- 

-34,.. 

2(<T^ en + cc7 l„)/cr 

P.j-P.n 

y-i- 

-y.n. 

2(^ eU + ncj 2 colm J/ar 

f^ij f^in l^-mj A^mw 

Vij. 

~ Vin • — Vmj. "*■ Vmn- 

lo%n/r 

Vij-Vin 

Vij. 

-y». 

2(cr? eU + <r ; n i„ m J/r 

l^ij l^tnj 

% 

-Vmj. 

2(cr| eU + ffJ ow )/r 

f^ij /Ann 

y<h 

~Vmn> 

2 ( <T Lu+ CT Llunm+^„)/'- 
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TABLE 25.5 


Estimated Standard Errors of the Comparisons for the Strip-Plot Design for i & m and j & n with 
Associated Degrees of Freedom 


Comparison 

Estimator 

df 

Pi.-fL. 

■\j-^ [ MSError(row )] 

tfl - l)(r -1) 

U .j-U.„ 

[MSError(column)] 

(c - l)(r -1) 

t^ij f^in l^mn 

^Jj [MSErmr(cell)] 

(a - l)(c - l)(r - 1) 

Vij-Vin 

J— [MSError(column ) + (c - 1 )MSError(cell)] 


Vij-Umj 

[MSError(row) + (a - 1 )MSError(cell)\ 

cb 2 

l^ij f^mn 

- [i aMSError(row ) + cMSError(column ) + (ac - a - c)MSError(cell)] 

d> 3 


where the degrees of freedom correspond to those either from a single mean square or for 
a combination of mean squares computed by using the Satterthwaite approximation as 

„ [MSError(column)+(c -1 )MSError(cell)] 2 

[MSError (column )] [(c - T)MSError(cell )] 2 

(c-l)(r-l) + (fl-l)(c-l)(r-l) 

„ [MSError(rozv)+(a-l)MSError(cell)] 2 
2 [MSError(row)] 2 [(a-l)MSError(cell)] 2 
(«-l)(r-l) + (fl-l)(c-l)(r-l) 


and 


„ [aMSError(rozv) + cMSError(colnmn)+ (ac-a- c)MSError(cell)] 2 

3 [aMS Error (roiv)] 2 [cMSError(column)] 2 [(ac-a-c)MSError(cell)] 2 
(fl-l)(r-l) (c-l)(r-l) (fl-l)(c-l)(r-l) 

When there are missing data, there will be many more different types of standard errors 
depending on the pattern of the missing data. 


25.3 Example: Nitrogen by Irrigation 

An experiment was conducted to study the effects and relationships between two irriga¬ 
tion methods and three levels of nitrogen on the yield of wheat. The field layout utilized 
four blocks or rectangles and the randomization plan for treatment assignment, and the 
yields are shown in Figure 25.4. The SAS®-Mixed code and type III sums of squares 
analysis of variance table for these data are in Table 25.6. The F-values for comparing the 
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Irrigation method Irrigation method 

Ii h 


Nitrogen N 2 
level 

N 3 
Ni 

Block 1 

Irrigation method Irrigation method 

h ii 


Nitrogen 3 
level 

N 2 

Ni 

Block 3 

FIGURE 25.4 Field layout, treatment assignment and data for the nitrogen and irrigation strip-plot. 
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levels of nitrogen, comparing the levels of irrigation, and for evaluating the nitrogen by 
irrigation interaction are 60.13, 52.18, and 33.12, respectively. The estimates of the standard 
errors and associated degrees of freedom or approximated degrees of freedom for the six 
types of comparisons are given in Table 25.7. The last three comparisons involve combina¬ 
tions of error terms and thus the degrees of freedom are approximated. The SAS-Mixed 
code to provide the REML estimates of the variance components is in Table 25.8 along with 
the estimates of the variance components. The tests for the fixed effects are shown in Table 
25.9. The results in Table 25.9 are identical to those from the type III analysis in Table 25.6. 
These tests are identical since the data set is balanced and the estimates of the variance 


TABLE 25.6 

SAS-Mixed Code to Provide Type III Sums of Squares Analysis for the Nitrogen 
and Irrigation Example 

PROC MIXED CL COVTEST METHOD=TYPE3; 

CLASS BLK N IRR; 

MODEL Yield=N| IRR/DDFM=KR; 

RANDOM BLK BLK*N IRR*BLK; 


Source 

df 

SS 

Error df 

F-Value 

PrF 

N 

2 

339.083333 

6 

60.13 

0.0001 

IRR 

1 

570.375000 

3 

52.18 

0.0055 

N x IRR 

2 

94.750000 

6 

33.12 

0.0006 

BLK 

3 

123.458333 

3.6578 

3.34 

0.1479 

BLK x N 

6 

16.916667 

6 

1.97 

0.2147 

BLK x IRR 

3 

32.791667 

6 

7.64 

0.0179 

Residual 

6 

8.583333 

— 

— 

— 
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TABLE 25.7 


Six Comparisons, Estimated Standard Errors and the Degrees of 
Freedom for the Nitrogen and Irrigation Data for i m and j n 


Comparison 

Estimated Standard Error 

df or Approximate df 

A- - A„- 

0.8396 

6 

Ay - An 

1.3497 

3 

Mi/ Mm Mm/' Mmn 

1.1961 

6 

Mi/ Min 

1.5161 

4.62 

My Mm/' 

1.0308 

10.8 

Mi/' Mm« 

1.6266 

5.88 


TABLE 25.8 

SAS-Mixed Code to Provide the REML Analysis of the Nitrogen and Irrigation Example 

PROC MIXED CL COVTEST METHOD=REML; 

CLASS BLK N IRR; 

MODEL Yield= N|IRR/DDFM=KR; 

RANDOM BLK BLK*N IRR*BLK; 

Covariance Parameter Estimates 

Covariance Standard 


Parameter 

Estimate 

Error 

Z-Value 

PrZ 

a 

Lower 

Upper 

BLK 

4.8056 

5.8023 

0.83 

0.2038 

0.05 

1.1037 

822.35 

BLK x N 

0.6944 

0.9127 

0.76 

0.2234 

0.05 

0.1478 

286.67 

BLK x IRR 

3.1667 

2.9876 

1.06 

0.1446 

0.05 

0.9021 

88.5030 

Residual 

1.4306 

0.8259 

1.73 

0.0416 

0.05 

0.5940 

6.9369 


TABLE 25.9 

Fixed Effects Tests for the REML Analysis for the 
Nitrogen and Irrigation Example 

Type III Tests of Fixed Effects 


Effect 

Num df 

Den df 

F-Value 

Pr > F 

N 

2 

6 

60.13 

0.0001 

IRR 

1 

3 

52.18 

0.0055 

N x IRR 

2 

6 

33.12 

0.0006 


components are all greater than zero. The SAS-Mixed code with estimate statements for 
providing estimates to compare the levels of nitrogen, the levels of irrigation, and the other 
four types of comparisons in Table 25.7 are given in Table 25.10. Again, there are six differ¬ 
ent types of comparisons with six different estimated standard errors. 

One of the inference problems with designs that involve more than one size of experimental 
unit is that various comparisons of the cell means involve different estimated standard 
errors. It is not possible to summarize these type of data by providing a single LSD value, as 
was the case for completely randomized and randomized complete block design structures. 



480 


Analysis of Messy Data Volume 1: Designed Experiments 


TABLE 25.10 

Estimate Statements Used with SAS-Mixed REML Analysis to Evaluate the Six Types of 
Comparisons 


ESTIMATE 

' N1 - 

N2' N 1 

-1 0 





ESTIMATE 

'N1 - 

N3' N 1 

0 -1 





ESTIMATE 

'N2 - 

N3' N 0 

1 -1 





ESTIMATE 

' IRR1 

- IRR2 ' 

IRR 

1 -1; 




ESTIMATE 

'Mil 

- M12 - M21 + 

M22' N* 

IRR 1 -1 

-1 

10 0; 

ESTIMATE 

'Mil 

- M12' IRR 1 

-IN* IRR 

1-10 0 

0 

0; 

ESTIMATE 

'Mil 

- M21' N 

1 -1 

ON*IRR 

10-10 

0 

0; 

ESTIMATE 

'Mil 

- M22' N 

1 -1 

0 IRR 1 

-IN*IRR 

1 

0 0-1 0 0; 


Estimates 


Label 

Estimate 

Standard Error 

df 

t -Value 

Pr > | f | 

N1-N2 

-3.5000 

0.8396 

6 

-4.17 

0.0059 

N1-N3 

-9.1250 

0.8396 

6 

-10.87 

<0.0001 

N2-N3 

-5.6250 

0.8396 

6 

-6.70 

0.0005 

IRR1 - IRR2 

-9.7500 

1.3497 

3 

-7.22 

0.0055 

MU - Ml 2 - M21 + M22 

-2.0000 

1.1961 

6 

-1.67 

0.1455 

Mil - M12 

-13.5000 

1.5161 

4.62 

-8.90 

0.0004 

Mil - M21 

-4.5000 

1.0308 

10.8 

-4.37 

0.0012 

Mil - M22 

-16.0000 

1.6266 

5.88 

-9.84 

<0.0001 


25.4 Example: Strip-Plot with Split-Plot 1 

The design in Section 25.3 can be extended to the case where within each of the cells there 
are three varieties of wheat, as shown in Figure 25.5. The randomization process is not 
displayed in Figure 25.5, but the levels of irrigation were randomly assigned to the columns 
of a rectangle, the levels of nitrogen were randomly assigned to the rows of a rectangle, 
and the varieties were randomly assigned to the three plots in a cell created by the inter¬ 
section of a row and column in each rectangle. If one averages over the levels of variety, 
then the nitrogen by fertilizer analysis is identical to that in Table 25.3 or Table 25.6. The 
levels of variety are subplots within each cell and any comparisons of varieties are within- 
cell or between-subplot comparisons. A model that can be used to describe this data is 

Vijki b i;k T &/ + Tji + Cji + dyi + Sjjki/ i 1/ 2, j 1,2,3, k 1,2,3, l 1,2,3,4 

where denotes the mean of the zth level of irrigation with the /th level of nitrogen and /cth 
variety, b l denotes the effect of the Zth block (rectangle) assumed to be distributed b t ~ i.i.d. 
N( 0, Cyock)/ r u denotes the effect of the z'th row within the Zth block assumed to be distrib¬ 
uted r a ~ i.i.d. N( 0, 07 ow ), c /7 denotes the effect of the /th column within the Zth block assumed 
to be distributed c ;7 - i.i.d. N( 0, ay o[umn ), d tjl denotes the random effect associated with the z;th 
cell within the Zth block assumed to be distributed as d, ;7 - i.i.d. N( 0, (7 ( 2 ( , n ), and e ijkl denotes 
the subplot random effect associated with the kth variety assigned to the z)'th cell within the 
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FIGURE 25.5 One of the blocks of the field layout with nitrogen and irrigation as strip-plot and varieties as 
split-plot. 


/th block assumed to be distributed as e ijkl ~ i.i.d. N( 0, (J^ ]bplo[ ). The analysis of variance for 
this model is given in Table 25.11 and the SAS-Mixed code that can be used to fit the model 
is given in Table 25.12. The row error term is block x nit, the column error term is block x irr 
and the cell error term is block x nit x irr. The residual is the subplot error term and would 
be computed as the variety by block interaction pooled across the levels of nit and irr. 


TABLE 25.11 

Analysis of Variance Table for the Strip-Plot with Split-Plot Design Structure 
Source df EMS 


Block 3 

Irrigation 1 

Error {row) 3 

Nitrogen 2 

Error(column ) 6 

Nitrogen x irrigation 2 

Error{cell ) 6 

Variety 2 

Irrigation x variety 2 

Nitrogen x variety 4 

Nit x irr x variety 4 

Error{subplot) 36 


^subplot + Sain + 6(7 ? ow + 90 2 column + 18<Tj; lock 

Subplot+ 3<T* 11 + 6 c r? ow + (p 2 (f) 

^subplot + Screen + 6 <tJ ow 

^subplot + 3 ctL + 9(T;„ ,„ mn + ^(N) 

^subplot + 3ff? ell + 9ff2 )kmm 

Subplot + 3<4 11 + <p 2 (Nxr) 

Subplot + 3CT|I 

Subplot +<P 2 W 
Subplot +<P 2 (tXV) 

Subplot +<P 2 (NxV) 

^ubpiot+^lNxfxV) 

subplot 
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TABLE 25.12 

SAS-Mixed Code to Fit the Model to the Strip-Split-Plot Data Set in Section 25.4 

Proc Mixed CL Covtest; 

Class block nit irr var; 

Model Yield= nit | irr | var/ddfm=KR; 

Random block block*nit irr*block irr*nit*block; 

Lsmeans nit|irr|var/diff; 


25.5 Example: Strip-Plot with Split-Plot 2 

Another way (different from that in Section 25.4) that the levels of variety can be included 
in the study are displayed in Figure 25.6. The levels of variety are stripped across both 
levels of irrigation within each level of nitrogen. The levels of variety are subplots for the 
levels of nitrogen, but the levels of variety and the levels of nitrogen are strip-plots with 
the levels of irrigation. The irrigation by nitrogen analysis (obtained by summing over the 
varieties within each cell) is identical to that in Section 25.3. The nitrogen by variety part 
of the design can be obtained by summing over the levels of irrigation within each variety 
by nitrogen combination. The resulting design is a split-plot where the levels of nitrogen 
form the whole-plot factor and the levels of variety form the subplot factor. The split-plot 
analysis of variance in displayed in Table 25.13. The whole plot error is identical to the row 
error and is computed as the block x nitrogen interaction. The subplot or subrow or 1/3 row 
error is computed as the block x variety interaction pooled across the levels of nitrogen. 
Next consider the data from nitrogen level 1 only. In this case, the resulting design is a 
strip-plot involving the levels of irrigation and varieties. The corresponding strip-plot 
analysis of variance table is in Table 25.14. The error term of interest is the subcell error for 
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FIGURE 25.6 One of the blocks of the field layout with nitrogen and irrigation strip-plot, varieties as split-plots 
for levels of nitrogen and varieties as strip-plot with levels of irrigation. 
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TABLE 25.13 

Split-Plot Part of the Analysis for the Examples 


in Section 25.5 

Source df 

Block 3 

Nitrogen 2 

Error(row) = block x nitrogen 6 

Variety 2 

Nitrogen x variety 4 

Error{subrow) = block x variety(nitrogen) 18 


TABLE 25.14 

Irrigation and Variety Data for the First Level of 


Nitrogen in Section 25.5 

Source df 

Block 3 

Irrigation 1 

Error(column) = block x irrigation 3 

Variety 2 

Error(subrow) = block x variety 6 

Irrigation x variety 2 

Error(subcell) = block x variety x irrigation 6 


the intersection of variety and irrigation, which is computed as block x irrigation x variety 
interaction with six degrees of freedom. These errors are pooled across the levels of 
nitrogen, yielding 18 degrees of freedom. The nitrogen x irrigation x variety interaction is 
also a subcell comparison. Thus the block x nitrogen x irrigation x variety interaction provides 
an additional 12 degrees of freedom for subcell error yielding a subcell error term with 
30 degrees of freedom. The analysis of variance table in Table 25.15 displays the sources of 
variation and the respective degrees of freedom. 


25.6 Strip-Plot with Split-Plot 3 

The graphic in Figure 25.7 shows that three varieties are assigned to the columns of the 
rectangle, the combinations of herbicide and nitrogen are assigned to the rows of the 
rectangle and the levels of seeding rate are assigned to the subplots within the cell. Assume 
the design consists of four such blocks. The varieties form a strip plot with the combination 
of levels of nitrogen and herbicide and the seeding rates are the subplots within the cells. 
The two-way treatment structure for the rows is the unique feature of this design. This 
structure points out that any type of treatment structure can be associated with any of the 
features of the design structure. The analysis of variance table is in Table 25.16 where 
the row error is computed by pooling the block x herbicide interaction, the block x nitrogen 
interaction, and the block x herbicide x nitrogen interaction, providing nine degrees of 
freedom. The column error term is computed by the block x variety interaction. The cell 
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TABLE 25.15 

Analysis of Variance Table for the Strip-Split-Plot 
Design in Section 25.5 


Source 

df 

Block 

3 

Irrigation 

1 

Error(column) 

3 

Nitrogen 

2 

Error {row) 

6 

Nitrogen x irrigation 

2 

Error{cell) 

6 

Variety 

2 

Variety x nitrogen 

4 

Error(subrow) 

18 

Irrigation x variety 

2 

Irrigation x nitrogen x variety 

4 

Error(subcell) 

30 


error term is obtained by pooling the block x variety x herbicide, block x variety x nitrogen and 
the block x variety x herbicide x nitrogen interactions, providing 18 degrees of freedom for 
the cell error. The subplot error is obtained by computing the block x seeding rate interaction 
for each combination of herbicide, nitrogen, and variety and then pooling these interactions 
across the levels of herbicide, nitrogen, and variety, providing 72 degrees of freedom for the 
residual error. 


H1N1 
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H2N2 

Seeding rates, 

SI, S2, and S3 


FIGURE 25.7 Graphic of One of the blocks of the field levels of herbicide and nitrogen assigned to rows, levels 
of varieties assigned to columns, and seeding rates assigned to the cells. 
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TABLE 25.16 


Analysis of Variance for the Strip-Split- 
Plot Design of Section 25.5 


Source 

df 

Block 

3 

H 

1 

N 

1 

NxH 

1 

Error(row) 

9 

V 

2 

Error(column) 

6 

VxH 

2 

VxN 

2 

VxHxN 

2 

Error(cell) 

18 

S 

2 

HxS 

2 

NxS 

2 

NxHxS 

2 

VxS 

4 

VxHxS 

4 

VxNxS 

4 

VxHxNxS 

4 

Error(snbplot) 

72 


25.7 Split-Plot with Strip-Plot 4 

This experiment involves studying the effect of two varieties, two seeding rates, two herbi¬ 
cides and two nitrogen rates on the yield of corn. The graphic in Figure 25.8 shows how one 
of four blocks of experimental units was treated. The levels of variety and seeding rate are 
assigned to the four big squares within each block. Within each square, the levels of herbi¬ 
cide and nitrogen form a strip-plot. The strip-plot part is a subplot of the big squares. If one 
ignores the levels of nitrogen and herbicide, the design for varieties and seeding rates is a 
two-way treatment structure in a randomized complete block design structure. The error 
term is computed as the treatment structure by block interaction and which is obtained by 
the pooling the block x variety, block x seeding rate and the block x variety x seeding rate inter¬ 
actions, providing a total of nine degrees of freedom. The big block error term can be obtained 
using variety x seeding x block in SAS-Mixed, and the procedure will automatically do the 
pooling for you. The analysis of variance table for the big block part of the analysis is given 
in Table 25.17. For variety 1 and seeding rate 1, the levels of herbicide and nitrogen provide 
a strip-plot design and the analysis of variance are summarized in Table 25.18. The error 
terms are pooled across the four combinations of variety and seeding rate to provide 12 
degrees of freedom for Error(roiv/block) (error of rows within a block), Error(colnmn/block) 
(error for columns within a block), and Error(cell/block) (error for cells within a block). The 
SAS-Mixed code and a complete analysis of variance table are displayed in Table 25.19. The 
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FIGURE 25.8 One of the blocks of the field layout with varieties and seeding rates as whole-plots and herbicide 
and nitrogen levels in a strip-plot as subplots. 


TABLE 25.17 


Big Block Analysis of Variance for Example in 
Section 25.7 


Source 

df 

Block 

3 

Variety 

1 

Seeding rate 

1 

Variety x seeding rate 

1 

Error(big block) 

9 


TABLE 25.18 


Strip-Plot Analysis of Variance for the Herbicide 
and Nitrogen Data from Variety 1 and Seeding 
Rate 1 for the Example in Section 25.7 


Source 

df 

Block 

3 

Herbicide 

1 

Error(roiv/block) 

3 

Nitrogen 

1 

Erroricolumn/block) 

3 

Nitrogen x herbicide 

1 

Error(cell/block) 

3 


error terms are computed as follows: Error(big squares) = Variety x Seeding x Block, Error(rozv/ 
block) = Herbicide x Block(Variety Seeding), Error(column/block) = Nitrogen x Block(Variety Seeding), 
and Error(cell/block) = Nitrogen x Herbicide x Block(Variety Seeding). 


25.8 Strip-Strip-Plot Design with Analysis via JMP7 

The strip-strip-plot design structure occurs when the experimental units are arranged into 
three-dimensional rectangles with rows, columns and tiers. An experiment from the 
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TABLE 25.19 


SAS-Mixed Code and Analysis of Variance for the 
Strip-Plot in Split-Plot Example of Section 25.7 


Proc Mixed covtest CL; 

Class Block Variety SeedingR Herbicide Nitrogen; 
Model yield=Variety|SeedingR|Herbicide|Nitrogen/ 
ddfm=KR; 

Random Variety*SeedingR*Block Herbicide*Block 
(Variety SeedingR) 

Nitrogen*Block(Variety SeedingR); 

Source 

df 

Block 

3 

Variety 

1 

Seeding rate 

1 

Variety x seeding rate 

1 

Errortbig square) 

9 

Herbicide 

1 

Herbicide x variety 

1 

Herbicide x seeding rate 

1 

Herbicide x variety x seeding rate 

1 

Error(row/block) 

12 

Nitrogen 

1 

Nitrogen x variety 

1 

Nitrogen x seeding rate 

1 

Nitrogen x variety x seeding rate 

1 

Error(column/block) 

12 

Herbicide x nitrogen 

1 

Herbicide x nitrogen x variety 

1 

Herbicide x nitrogen x seeding rate 

1 

Herbicide x nitrogen x variety x seeding rate 

1 

Error(cell/block ) 

12 


semi-conductor industry is used to demonstrate the strip-strip-plot design structure. The 
experiment consists of evaluating three factors where each factor occurs at its own step in 
the process; that is, the experiment involves three steps. The first step is to add a layer of 
oxide to the surface of a silicon wafer (20 cm in diameter, 0.2 cm inches thick). The oxide 
layer is applied to the surface by putting the wafer into a furnace set to a specific tempera¬ 
ture. Two levels of temperature were studied. The second step involves polishing or clean¬ 
ing the surface of the wafer to smooth out any bumps that may have occurred, and two 
cleaning methods were evaluated. Finally, the third step involves washing off remaining 
surface particles that could cause interference with the circuitry being built. Two washing 
methods were included in the experiment. At each step four wafers are subjected together 
to the levels of the corresponding factor. The schematic in Figure 25.9 shows how eight 
wafers are moved through the three steps and how the wafers are grouped during each step. 
Figure 25.10 is a three-dimensional display showing that the levels of clean are assigned to 
rows, the levels of wash are assigned to columns and the levels of temperature are assigned 
to tiers. The study was repeated four times, generating four blocks. The experimental units 
for the levels of clean are the rows within a block and the row error term is computed as the 
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clean x block interaction. The experimental units for the levels of wash are the columns 
within a block and the column error term is computed as the ivash x block interaction. If one 
sums over the levels of temperature, the resulting structure is a strip-plot design with clean 
and wash as the two factors. The experimental units for the clean by wash interaction are 
the two wafers from the two temperatures and the error for the two temperature wafers is 
computed as the clean x ivash x block interaction. The experimental units for the levels of 
temperature are the tiers of the three-dimensional rectangle. The tier error term is computed 
as the temperature x block interaction. If one sums over the levels of clean, the result is a strip- 
plot design with temperature and wash. The experimental units for the temperature by 
wash interaction are the two wafers from the two types of clean, which is computed as the 
temperature x ivash x block interaction. If one sums over the levels of wash, the resulting 
design is a strip-plot with temperature and clean as the two factors. The experimental units 



A = Temperature C = Clean method D = Wash method 
FIGURE 25.9 Three step process of applying the levels of temperature, cleaning, and washing. 



(Dl) (D2) 

FIGURE 25.10 Three-dimensional rectangular showing the levels of clean are assigned to rows, the levels of 
wash are assigned to columns and the levels of temperature are assigned to tiers. The numbers correspond to 
the wafer numbers in Figure 25.9. 
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for the temperature by clean interaction are the two wafers from the two levels of wash and 
the error for the two wash wafers is computed as the temperature x clean x block interaction. 
The experimental units for the temperature by clean by wash interaction are the individual 
wafers and the error for the wafer is computed as the temperature x clean x wash x block inter¬ 
action. The response being measured is the average thickness of the resulting layer of oxide 
measured at nine locations on each wafer (in Angstroms) after the three steps. A model that 
can be used to describe the thickness data is 

Vijki = Pijk + b, + r a + c fl + t kl + w™ + wf, + wfft + £ ijkU i = 1,2, j = 1, 2, k = 1, 2, l = 1, 2, 3,4 

where p ijk denotes the mean thickness for level i of temperature, level j of clean and level k 
of wash and 


b, ~i-i-d. N( 0, cr 2 lock ), r a ~i.i.d. N( 0, cr r 2 ow ), c jt ~i.i.d. N( 0, (J 2 olumn ) 
t n ~i.i.d. N{ 0, (7 2 er ), w™ ~i.i.d. N( 0, cr 2 washwafer ) 

W m ~ ild ■ ^Leanwafer)^ ~ ild - N (0’ ^tempwafer) 


and 


%; ~ i-i-d. N( 0, o^ afer ) 

The analysis of variance is given in Table 25.20 where w, 2Tw, 2Cw, and 2Wiv denote 
wafer, two temperature wafers, two clean wafers, and two wash wafers, respectively. Each 
factor effect has its own set of experimental units and corresponding error term. 


TABLE 25.20 

Analysis of Variance for the Strip-Strip-Plot Design of Section 25.8 


Source 

df 

EMS 

Block 

3 

al + 2o 2 CTw + 2cr 2 mw + 2a 2 m „ + + 4<r 2 ol + 4cr; ier + 8a 2 m 

Temperature 

1 

ffl + 2a 2 CItv + 2a 2 mw + 4af ier + cp 2 (T) 

Emptier ) 

3 

a 2 + 2 g 2 + 2ct^ Wi , + 4a 2 

w CTw 2vvw her 

Clean 

1 

a 2 + 2a 2 + 2cr 2 + 4cr 2 + <p 2 (C) 

w 2 Tw 2Vfw row tv/ 

Erroprow) 

3 

G 2 , + 2G 2 7tl , + 2a 2 Ww + 4a 2 ow 

Wash 

1 

a 2 , + 2o 2 llic + 2g 2 2Cw + 4ct 2 o1 + <p 2 (W) 

Erropcolumn) 

3 

°1 + 2o 2 2 Tw + 2<j 2 2 Cw + 4(j 2 co1 

Temperature x wash 

1 

a 2 w + 2ct 2 Cii , + (p 2 (T x W) 

Erroptwo clean wafers) 

3 

°l + 2 °lcu, 

Temperature x clean 

1 

o' 2 , + 2cJ 2 Wlo + <p 2 (T x C) 

Erroptwo wash wafers) 

3 


Wash x clean 

1 

ol + 2o 2 llic + <p 2 (C x W) 

Erroptwo temperature wafers) 

3 

ff 2 , + o 2 li „ 

Temperature x clean x wash 

1 

a 2 + <p 2 (T x C x W) 

Erropwafer) 

3 

o 2 
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Figure 25.11 contains the data set and Figure 25.12 is the JMP model specification screen. 
Each of the interactions involving block has been declared to be a random effect by using the 
Attributes button. The REML estimates of the variance components and the type III tests for 
the fixed effects are displayed in Figure 25.13. There is a significant temperature x clean x wash 
interaction and those cell means need to be evaluated. Least squares means for the three-way 
interaction and letters denoting significant differences are given in Figure 25.14. Another 
way to view the pairwise comparisons of the three-way interaction means is to plot all 
pairwise differences ranked from largest to smallest with confidence intervals about the 


<S ex_25_5 

<5 Columns (5/1) 

||, block 
||. Temp 
||. Clean 
lli Wash 

a M.m 

< - 

block 

Temp 

Clean 

Wash 



1 

1 

1 

1 

79.14 

2 

1 

1 

1 

2 

94.12 

3 

1 

1 

2 

1 

91.42 

4 

1 

1 

2 

2 

115.33 

5 

1 

2 

1 

1 

86.52 

6 

1 

2 

1 

2 

113.49 

7 

1 

2 

2 

1 

104.94 

8 

1 

2 

2 

2 

128.89 

9 

2 

1 

1 

1 

101.82 

10 

2 

1 

1 

2 

104.70 

11 

2 

1 

2 

1 

115.01 

12 

2 

1 

2 

2 

129.77 

13 

2 

2 

1 

1 

103.37 


14 

2 

2 

1 

2 

112.21 

15 

2 

2 

2 

1 

119.10 

16 

2 

2 

2 

2 

132.85 

17 

3 

1 

1 

1 

100.13 

18 

3 

1 

1 

2 

104.97 

19 

3 

1 

2 

1 

108.83 

20 

3 

1 

2 

2 

124.41 

21 

3 

2 

1 

1 

104.48 

S Rows 

All rows 32 

Selected 1 

Excluded 0 

Hidden 0 

Labelled 0 

22 

3 

2 

1 

2 

116.58 

23 

3 

2 

2 

1 

116.24 

24 

3 

2 

2 

2 

122.74 

25 

4 

1 

1 

1 

93.95 

26 

4 

1 

1 

2 

93.73 

27 

4 

1 

2 

1 

105.29 

28 

4 

1 

2 

2 

120.06 

29 

4 

2 

1 

1 

101.00 

30 

4 

2 

1 

2 

109.55 

31 

4 

2 

2 

1 

111.90 

32 

4 

2 

2 

2 

120.09 


FIGURE 25.11 Data set for three-step process for example in Section 25.8. 
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▼ r Model Specification 


Select Columns 

& block 
A Temp 
||, Clean 
iWash 
^thickness 


Pick Role Variables 


QZ 

^thickness 

optional 


Weight | 

| optional Numeric 


[ Freq J | options/ A tumeric 


(bTJI 

| optional 



Personality: standard Least Squares v 
Emphasis: 

Method: 


Minimal Report v 

REML (Recommended) v 


0 Unbounded Variance Components 
I I Estimate Only Variance Components 


I Help | 

[Remove] 


| Run Model | 


Construct Model Effects 


[ Add ] 

[ Cross 1 

[ Nest ] 

[Macros v ] 

Degree [[2] 
Attributes t' 
Transform 5) 

I I No Intercept 


blocks Random 
Temp 

block*Temp& Random 
Clean 

block'CleanS Random 
Temp*Clean 

block*Temp'Cleans Random 
Wash 


block*Wash& Random 


Temp*Wash 

block*Temp*Wash& Random 
Clean*Wash 

block’Clean’Washfi Random 


FIGURE 25.12 Fit model screen for three-step process of Section 25.8. 


differences. Figure 24.15 is the display of the pairwise differences. The widths of the confi¬ 
dence intervals are not all equal as the variance of the difference depends on the type of 
comparison. The number of different variances is much larger for the strip-strip-plot than 
the number of different variances for the strip-plot displayed in Table 25.4. 


25.9 Concluding Remarks 

The strip-plot design is one of the basic design structures and the examples discussed in 
this chapter illustrate some of the complexities that can occur. Several examples are 
included so that the reader can develop a method for obtaining an appropriate analysis for 
complex designs. The strip-plot design structure was used to show how several sizes of 
experimental units can occur simply by the method of assigning the levels of the factors to 
the experimental units. The basic strip-plot was analyzed to demonstrate the complexity 
in computing the estimated standard errors for making various comparisons among the 
means. Obviously, as the design becomes more complex, there are many different stan¬ 
dard errors that need to be evaluated, as demonstrated with the strip-plot and split-plot 
combination designs and the strip-strip-plot design. Sample size and power computations 
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® ex_25_5- Fit Least Squares | _ || □ ||_X 


T Response thickness 

▼ Parameter Estimates 


Term Estimate Std Error DFDen t Ratio Prob>|t| 

Temp[1 ]*Clean[1 ]*Wash[1 ] 1.5813408 0.264557 3 5.98 0.009^ 

► Random Effect Predictions 

▼ REML Variance Component Estimates 


Random Effect 

Var Ratio 

Var Component 

Std Error 

95% Lower 

95% Upper 

Pet of Total 

block 

8.0066054 

17.932316 

29.563308 

-40.01177 

75.876399 

33.094 

block'Temp 

2.7991451 

6.2692178 

7.2488366 

-7.938502 

20.476937 

11.570 

block*Clean 

1.0309687 

2.3090505 

4.2308591 

-5.983433 

10.601534 

4.261 

block*Temp*Clean 

1.1348173 

2.5416392 

3.1262885 

-3.585886 

8.6691647 

4.691 

block*Wash 

9.284224 

20.793786 

18.341215 

-15.155 

56.742566 

38.375 

block*Temp*Wash 

0.4497509 

1.0073028 

1.962789 

-2.839764 

4.8543693 

1.859 

block*Clean*Wash 

0.4879842 

1.0929335 

2.0249184 

-2.875907 

5.0617736 

2.017 

Residual 

Total 


2.2396902 

54.185935 

1.8286994 

0.7187399 

31.136318 

4.133 

100.000 


-2 LogLikelihood = 162.75035529 

► Iterations 


▼ Fixed Effect Tests 


Source 

Nparm 

DF 

DFDen 

F Ratio 

Prob > F 

Temp 

1 

1 

3 

13.3541 

0.0354* 

Clean 

1 

1 

3 

101.7986 

0.0021* 

Temp'Clean 

1 

1 

3 

3.3525 

0.1645 

Wash 

1 

1 

3 

13.9991 

0.0333* 

Temp Wash 

1 

1 

3 

2.2164 

0.2333 

Clean*Wash 

1 

1 

3 

12.7476 

0.0375* 

Temp*Clean*Wash 

► Effect Details 

1 

1 

3 

35.7283 

0.0094* 


FIGURE 25.13 Estimates of the variance components and tests of the fixed effects for the three-step process. 


Level 


Least Sq Mean 

2,2,2 

A 

126.14382 

1,2,2 

A B 

122.39327 

2,2,1 

B C 

113.04267 

2,1,2 

C D 

112.95843 

1,2,1 

D E 

105.13807 

1,1,2 

E F 

99.37892 

2,1,1 

E F 

98.84299 

1,1,1 

F 

93.76016 


Levels not connected by same letter are significantly different. 


FIGURE 25.14 Three-way least square means with lines for the three-step process. 
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Level 

- Level 

Difference 

Lower CL 

Upper CL 

2,2,2 

1,1,1 

32.38365 

22.1881 

42.57919 

1,2,2 

1,1,1 

28.63311 

18.7410 

38.52525 

2,2,2 

2,1,1 

27.30082 

17.4087 

37.19297 

2,2,2 

1,1,2 

26.76490 

20.2435 

33.28632 

1,2,2 

2,1.1 

23.55028 

13.3547 

33.74581 

1,2,2 

1,1,2 

23.01435 

18.2930 

27.73569 

2,2,2 

1,2,1 

21.00574 

10.9064 

31.10514 

2,2,1 

1,1,1 

19.28250 

12.7611 

25.80392 

2,1,2 

1,1,1 

19.19826 

9.0989 

29.29766 

1,2,2 

1,2,1 

17.25520 

7.1550 

27.35537 

2,2,1 

2,1,1 

14.19967 

9.4783 

18.92101 

2,1,2 

2,1,1 

14.11544 

4.0153 

24.21561 

2,2,1 

1,1,2 

13.66375 

3.4682 

23.85928 

2,1,2 

1,1,2 

13.57951 

7.4466 

19.71244 

2,2,2 

2,1,2 

13.18539 

8.4641 

17.90672 

2,2,2 

2,2,1 

13.10115 

3.0010 

23.20133 

1,2,1 

1,1,1 

11.37791 

6.6566 

16.09925 

1,2,2 

2,1,2 

9.43484 

2.9134 

15.95626 

1,2,2 

2,2,1 

9.35060 

-0.7488 

19.45000 

2,2,1 

1,2,1 

7.90459 

1.7717 

14.03753 

2,1,2 

1,2,1 

7.82036 

-2.3752 

18.01589 

1,2,1 

2,1,1 

6.29508 

-0.2263 

12.81650 

1,2,1 

1,1,2 

5.75915 

-4.1330 

15.65130 

1,1,2 

1,1,1 

5.61876 

-4.4814 

15.71893 

2,1,1 

1,1,1 

5.08283 

-1.0501 

11.21576 

2,2,2 

1,2,2 

3.75055 

-2.3824 

9.88348 

1,1,2 

2,1,1 

0.53593 

-9.5635 

10.63532 

2,2,1 

2,1,2 

0.08424 

-9.8079 

9.97638 



FIGURE 25.15 Graphical representation of the differences of the three-way least square means for the three- 
step process. 


can be carried out using the method described in Chapter 24 and by using the estimated 
standard error(s) of the comparison(s) of interest. 


25.10 Exercises 

25.1 A cupcake baking experiment was carried out similar to the one described in 
Chapter 5 with the graphical representation in Figure 5.9. In this study there are 
four temperatures, four recipes, and five days. The four recipes are made up of 
two levels of fat and two levels of fiber. The data are in the following table where 
the response is the volume (cm 3 ) of the cupcake on day 1, day 2,..., day 5. 

1) Identify the experimental units and their design and treatment structures 

2) Write down a model to describe the data set. 

3) Fit the model to the data and carry out the necessary mean comparisons. 
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Temperature 

Fat 

Fiber 

Day 1 

Day 2 

Day 3 

Day 4 

Day 5 

325 

H 

H 

37.6 

33.0 

31.8 

36.4 

31.1 

325 

H 

L 

35.8 

35.8 

32.7 

36.4 

35.1 

325 

L 

H 

30.7 

28.2 

27.1 

32.1 

25.7 

325 

L 

L 

36.1 

26.8 

29.0 

32.6 

28.5 

340 

H 

H 

44.1 

40.1 

32.2 

38.9 

41.4 

340 

H 

L 

42.2 

40.2 

35.9 

39.8 

44.8 

340 

L 

H 

37.6 

35.2 

31.3 

34.4 

34.4 

340 

L 

L 

43.6 

33.3 

32.3 

37.2 

41.6 

360 

H 

H 

45.8 

39.8 

36.3 

42.5 

41.6 

360 

H 

L 

42.2 

45.1 

42.6 

40.9 

45.8 

360 

L 

H 

35.8 

40.6 

32.4 

38.4 

32.8 

360 

L 

L 

44.2 

35.1 

39.1 

40.3 

38.9 

400 

H 

H 

48.0 

43.6 

34.8 

41.2 

39.8 

400 

H 

L 

45.5 

47.1 

44.7 

42.7 

48.2 

400 

L 

H 

39.6 

42.7 

36.6 

41.8 

35.8 

400 

L 

L 

45.9 

36.9 

38.3 

41.7 

38.6 


4) Compute the standard errors for the six types of comparisons discussed in 
Section 25.2. 

5) Determine the sample size (number of days) necessary do detect a difference 
of 3 units in the volumes of cupcakes made from two different recipes. Use 
type I and II error rates of 0.05 and 0.10 respectively. 

6) Determine the sample size (number of days) necessary to detect a difference 
of 3 units in the volumes of two temperatures within the same recipe. Use 
type I and II error rates of 0.05 and 0.10 respectively. 

7) Determine the power of the test for detecting a difference of 3 units in the 
volumes of two temperatures within the same recipe using 5 days of data. 
Use type I error rate of 0.05. 

25.2 An engineer designing composite materials for aircraft designed an experiment 
to evaluate the effect of types of lacquer, paint, and temperature on the strength 
of a composite material. A plate of composite material was manufactured. The 
plate was divided into thirds row-wise and the three levels of lacquer were 
randomly assigned to the rows. The plate was divided in half column-wise and 
the two levels of paint were assigned to the columns. Finally each of four plates 
was cut into six pieces corresponding to the rows and columns. These six pieces 
were further cut into three pieces and the three levels of temperature were ran¬ 
domly assigned to the three small pieces within each of the larger pieces. The 
strength of each of the small pieces was measured by determining the amount of 
force required to break the piece while bending. The following is a diagram as to 
how one plate was treated. 
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( 

Paint 1 


Plate 


^-N 

Paint 2 



The data set follows. 


LACQ 

Paint 

Temperature 

Plate 1 

Plate 2 

Plate 3 

Plate 4 

1 

1 

40 

143 

180 

186 

201 

1 

1 

50 

145 

173 

198 

191 

1 

1 

60 

146 

187 

197 

202 

1 

2 

40 

201 

192 

249 

234 

1 

2 

50 

209 

187 

243 

240 

1 

2 

60 

207 

185 

253 

227 

2 

1 

40 

180 

175 

175 

223 

2 

1 

50 

182 

175 

186 

238 

2 

1 

60 

185 

182 

179 

236 

2 

2 

40 

254 

290 

300 

308 

2 

2 

50 

267 

286 

303 

295 

2 

2 

60 

287 

293 

306 

299 

3 

1 

40 

199 

196 

226 

256 

3 

1 

50 

219 

212 

228 

260 

3 

1 

60 

216 

210 

238 

267 

3 

2 

40 

301 

309 

324 

348 

3 

2 

50 

310 

322 

319 

359 

3 

2 

60 

313 

333 

343 

371 


1) Provide a description of the design structure by identifying each size of 
experimental unit and associated design structure. 
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2) Write out a model that can be used to describe this data. 

3) Write out an analysis of variance table with the sources of variation and 
degrees of freedom. Identify the error terms for each size of experimental 
unit. 

4) Fit the model to the data and make all necessary comparisons among the 
means. 

5) Is there evidence of a linear or quadratic trend in temperature for any of the 
lacquer by paint combinations? 

25.3 An animal scientist wanted to determine the effect of different types of light- 
dark cycles have on the ability of different diets to affect the growth of chickens 
from hatching to 28 days of age. There are four diets that are combinations of 
low- and high-oil corn with two sources of protein. During days 0-14 there are 
three light-dark cycles, 24 h light and 0 h dark, 12 h light and 12 h dark, and 16 h 
light and 8 h dark. During days 15-28 there are two light-dark cycles, 12 h light 
and 12 h dark, and 16 h light and 8 h dark. The experiment was repeated three 
times, providing three blocks and three replications of the treatment combina¬ 
tions. Twenty-four groups of five male chickens were formed. These 24 groups 
were divided into six sets of four cages and within a set the four diets were 
randomly assigned to the four cages. Two of the sets of four cages were ran¬ 
domly assigned to each of the three light-dark cycles for days 0 to 14. All cages 
of five chickens assigned to a light-dark cycle were put into one room; that is, 
there were eight cages (two sets of four cages) within a room at the same time, 
all subjected to the same light-dark cycle. At day 15, one of the cages from each 
of the first light-dark cycles was assigned to one of the two second light-dark 
cycles. The following graphic shows the flow of the pens through the two steps 
of light-dark cycles where the squares represent the sets of four pens and Dl, 
D2, D3, and D4 represent the four diets. 


Days 0-14 


24L 0D 12L 12D 16L 8D 



Days 15-28 


The following table gives the data that was collected with the mean weight of 
the chickens in a pen. If the pen had four or five chickens at the end of the study, 
the pen was kept in the data set and the mean weight was recorded. If the pen 
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had three or fewer chickens at the end of the 28 days that data were not included 
in the data set. The experiment was repeated in three sucessive months, c,y 
denotes the zth light-dark cycle in phase 1 and the /1 h light-dark cycle in the 
second phase. 


Month 

Oil 

Protein 

ell 

cl2 

c21 

c22 

c31 

c32 

1 

H 

H 

1.55 

— 

1.71 

1.22 

— 

1.50 

1 

H 

L 

— 

0.66 

1.34 

0.94 

1.65 

— 

1 

L 

H 

— 

0.81 

— 

1.15 

— 

1.35 

1 

L 

L 

1.08 

0.64 

— 

0.94 

1.55 

1.16 

2 

H 

H 

1.99 

1.59 

2.19 

1.65 

2.12 

— 

2 

H 

L 

1.60 

1.27 

1.76 

— 

1.71 

1.55 

2 

L 

H 

1.84 

1.50 

1.98 

— 

1.93 

— 

2 

L 

L 

— 

1.28 

1.66 

1.32 

1.59 

1.52 

3 

H 

H 

2.32 

2.08 

2.25 

— 

2.20 

1.91 

3 

H 

L 

1.88 

1.84 

1.88 

1.68 

1.81 

1.64 

3 

L 

H 

2.12 

2.02 

— 

1.86 

2.01 

1.84 

3 

L 

L 

— 

1.79 

1.82 

1.65 

1.69 

1.67 


1) Identify the sizes of experimental units and then specify the corresponding 
design and treatment structures. 

2) Write out a model that can be used to describe this data. 

3) Write out an analysis of variance table with the sources of variation and 
degrees of freedom assuming there are no missing data. Identify the error 
terms for each size of experimental unit. 

4) Fit the model to the data and make all necessary comparisons among the 
means. 

25.4 For the model in Section 25.4, determine the variances of the estimates for each 

of the following comparisons. 

1) Mm — Mi 12 

2) Mm — Mm 

3) Mm — M 211 

4) Mil- ~ Ml2- “ M21. “ M22- 





Methods for Analyzing Repeated 
Measures Experiments 


Like experiments using split-plot designs, experiments utilizing repeated measures 
designs have structures that involve more than one size of experimental unit. For example, 
a subject may be measured over time where time is one of the factors in the treatment 
structure of the experiment. By measuring the subject at several different times, the sub¬ 
ject is essentially being "split" into parts (time intervals), and the response is measured on 
each part. The larger experimental unit is the subject or the collection of time intervals. 
The smaller unit is the interval of time during which the subject is exposed to a treatment 
or an interval just between time measurements. 

Repeated measures designs differ from split-plot designs in that the levels of one or 
more factors cannot be randomly assigned to one or more of the sizes of experimental 
units in the experiment. In this case, the levels of time cannot be assigned at random to the 
time intervals, and thus analyzing a repeated measures experiment as though it was a 
split-plot experiment may not be valid. Because of this nonrandom assignment, the errors 
corresponding to the respective experimental units may have a covariance matrix that 
does not conform to the covariance matrix corresponding to experiments for which the 
usual split-plot analysis is valid. Analyzing a repeated measures experiment as though it 
was a split-plot experiment is often called a split-plot in time analysis. 

In Section 26.1, the repeated measures models are described, and the assumptions nec¬ 
essary for the split-plot in time analysis of variance to be valid are given. Section 26.2 
gives three examples that demonstrate the split-plot in time analysis of variance computa¬ 
tions, including the computations of standard errors for making various comparisons 
between means. In Chapter 27, methods are presented for analyzing repeated measures 
experiments when the split-plot in time assumption is not satisfied. In addition, methods 
are presented that allow one to test whether or not the split-plot in time assumption is 
satisfied. 
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26.1 Model Specifications and Ideal Conditions 

Repeated measures designs can be applied in numerous situations. Example 26.1, ana¬ 
lyzed in the next section, investigates the effects of three drugs on heart rates, where each 
drug was administered to eight people. Each person's heart rate was then measured 5,10, 
15, and 20 min after administering the drug. 

In the general model for a simple repeated measures experiment, n, subjects are randomly 
assigned to treatment i, and each subject is measured at p time points. Table 26.1 illustrates 
the layout for a simple repeated measures experiment. 

The larger size of experimental unit is the subject, and the smaller size experimental unit 
is the time interval when using the split-plot in time notation. A model that describes each 
measured response is similar to the split-plot model in a completely randomized design 
and is given by 

Vijk = H+oc i + 8 ik + Tj + (ar)„ + e ijk (26.1) 

or 

Vijk - M,; + 8,k + £ ijk 


TABLE 26.1 

Layout for a Simple Repeated Measures Experiment 

TIME 

TRT Subject 12 3 p 

1 1 — — — 


2 1 
2 


«2 


t 1 

2 


n, 
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where p + a, + 8 jk is the subject part of the model and T, + («t), 7 + e ijk is the within subject 
(time interval) part of the model. The mean of treatment i at time j is 

Rij = P + cc,+ f + (ar),/ 

The 8 ik represent the subject error components, and the £ ijk represent within-subject (time 
interval) errors. The ideal conditions for a split-plot in time analysis are that 

1) The 8 lk are independently and identically N( 0, c|). 

2 ) The e ijk are independently and identically N( 0, (7j). 

3 ) The 8 lk and the £ ijk are all independent of one another. 

Note that these are the same assumptions that were made for the analysis of a split-plot 
experiment in a completely randomized design (see Chapter 24 ). Such assumptions may 
not always be appropriate for a repeated measures design. However, the split-plot analysis 
is also a correct analysis under more general conditions. The more general conditions will 
be given in Chapter 27 . 

In Example 26 . 3 , the attitudes among family members were studied in relation to whether 
the home was in a rural or an urban environment. The type of family required for the 
study consisted of three members, a son, a father, and a mother. Then 10 such families were 
randomly selected from an urban environment and seven such families were randomly 
selected from an rural environment. The attitude of each family member towards a moral 
issue was determined. Also of interest was whether the attitudes changed over time; hence, 
measurements were made on each person at the 0, 6 , and 12 month time points. There are 
three sizes of experimental units in this design: the experimental unit corresponding to 
the treatment effect environment is the family; the experimental unit corresponding to the 
type of family member effect is the person; and the experimental unit for time is the six- 
month time interval. A model to describe the attitudes has three error components, one for 
each size of experimental unit in the experiment. One such model is 

Vijkm = P + O', +f im family experimental unit 

+ f j + (o'fi),, + p lfm person experimental unit 

+ r k + {oa) ik + (J3z)j k + (aPz) ijk + e ijkm time interval experimental unit 

or 

Vijkm — Rijk fim Pijm ^ &ijkm ( 26 . 2 ) 

In both of the above models i = 1, 2 ; j = 1 , 2 , 3 ; k = 1 , 2 , 3 ; m = 1 , 2 ,..., ri, where zz, = 10 and 
n 2 = 7 . 

The error terms are f im which represents family within environment error, p i]m which 
represents person within family error, and £,, km which represents the time within-person 
error. Note that this experiment has two sets of repeated measures, the three family mem¬ 
bers form one set of repeated measures as family member designation cannot be randomly 
assigned to family members, and time forms the second set of repeated measures. If both 
sets of repeated measures satisfy traditional split-plot type assumptions, then the ideal 
conditions on the error terms in the above models are 
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1) The /„„ are independently identically distributed N{ 0, of). 

2) The p ijm are independently identically distributed N(0, of). 

3) The e ijlcm are independently identically distributed N( 0, of). 

4) All the/,,,, p, im , and e ijkm are independent of one another. 

Chapter 27 will consider an analysis of the family member example that does not require 
that the ideal conditions be satisfied. 


26.2 The Split-Plot in Time Analyses 

The usual split-plot in time analyses of variance refers to the analyses discussed in Chapters 
24 and 25 for split-plot and strip-plot designs. Such analyses are provided by most computer 
packages. If the ideal conditions are satisfied then these analyses provide valid F-tests. 
Three examples are used in this section to demonstrate analyses of repeated measures 
designs and to show how to determine estimates of interesting effects, their estimated 
standard errors (which are necessary to make various multiple comparisons), and provide 
methods to study contrasts between among various kinds of means. The examples used 
here are also used in Chapter 27 to demonstrate the techniques needed when the ideal 
conditions on the model error terms are not satisfied. 

26.2.1 Example 26.1: Effect of Drugs on Heart Rate 

An experiment involving t drugs was conducted to study each drug's effect on the heart 
rate of humans. After the drug was administered, each person's heart rate was measured 
every five minutes for a total of p times. At the start of the study n female human subjects 
were randomly assigned to each drug. A model similar to Equation 26.1 is used to describe 
the data. The model is 

y ijk = p+ a, + 5, k + Tj +{ax) lj + e ijkf i = 1,2,..., f; / = 1, 2,..., p; k-1,2,... ,n 

The model has two error terms: 8, represents a subject error component, and e ijk repre¬ 
sents a time error component. The ideal conditions for a split-plot in time analysis is that 

1) The 8 ik are independently and identically N( 0, of). 

2) The e ijk are independently and identically N( 0, of). 

3) The 8 ik and the £ ijk are all independent of one another. 

Table 26.2 gives the split-plot in time analysis of variance table for Example 26.1. 

In Table 26.2, Q Drug , Q Tm „„ and Q DrugxTime are noncentrality parameters measuring the Drug 
effect. Time effect, and the Drug x Time interaction effect, respectively. These Q values are 
zero if and only if the corresponding effects are equal to zero. 

To test H m : Q Drug = 0, one rejects H m if 

f _ MSDrug 
MSError(Subject) 
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TABLE 26.2 


Analysis of Variance Table for Example 26.1 


Source of 
Variation 

df 

SS 

EMS 

Drug 

t-l 

np'Zif... 

1=1 

Ol+pal+Qorug 

ErrortSubject) 

t(n - 1) 

pEE(y,.*-y,..) 2 

i=l k=l 

ol + po | 

Time 

P~ 1 

nd'Zty./.-y...) 2 

;'=i 

<7?-+ Qrirn, 

Drug x Time 

(f-l)(p-l) 

t p 

i= i ;=i 

V QDrugxTime 

Error(Time) 

t(n - 1 )(p - 1) 

t p n 

E E E (.% - Vii- - y.-k + Vi- f 

i=l j=l k= 1 

o 2 e 


To test H 02 : Q Time = 0, one rejects H 02 if 

_ MSTime 

MSError(Time) a ' p “ u<p “i)(«-i) 

To test H 03 : Q DrugxTime = 0, one rejects H 03 if 

r MSDrug x Time ^ r 
~ MSError(Time) > 

Similar to that which was done for a split-plot experiment, method of moment solutions 
for the two variance components are given by 

a] = MSErrorfTime ) 

_ 2 _ MSError(Subject) - 
0 8 ~ 

V 

Then the estimates of the two variance components are taken as 


K 2 if^ 2 >0 

[0 if a\ < 0 


and 
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When one finds significant effects in the data, one will generally want to compare vari¬ 
ous means with one another. If there is no Drug x Time interaction, then one will often 
want to make comparisons between the Drug main effect means and the Time main effect 
means. If the interaction effect is significant, then one may wish to compare drugs with 
one another at each time point and/or times to one another for each drug. The estimators 
of p,j, ft;., p.j and p.. are p,j = y ih , p : . = y,.„ p., = y. h , and p = y , respectively. Since repeated 
measures designs involve more than one size of experimental unit and more than one 
error term, the variance of each comparison could involve different functions of the vari¬ 
ances of the respective error terms and thus need to be determined. Table 26.3 gives the 
best estimates of various functions of the mean parameters in a repeated measures experi¬ 
ment along with their respective estimated standard errors. 

Finally, the best estimate of p ,y - p^ - p^ + p { y is p tj - p Vj - p tj - + p^ and its estimated stan¬ 
dard error is V(4c7 \/n) when i ± i' and j J=j’. 

Inferences about each of the parameter functions above can be made using the ratio of a 
best estimate to its estimated standard error. Each ratio either has a f-distribution or can be 
approximated by a f-distribution. For those ratios whose estimated standard error involves 
a £ only, the ratio has an exact f-distribution with tip - l)(p - 1) degrees of freedom. For those 
ratios whose estimated standard error is a multiple of V(df + pdf), the ratio also has an exact 
f-distribution but with t(n - 1) degrees of freedom. Finally, for those ratios whose estimated 
standard error is a multiple of V(df + df), the ratio has an approximate f-distribution whose 
degrees of freedom must be approximated by Satterthwaite's method. 

In general, let (p = X/, T : ,c,,p n be any linear combination of the p^, then the same linear 
combination of the p^, (j)= X/, X/c,,//,, is its best estimate. The estimated variance of (j) is 
equal to k[c 1 df + c 2 b/] for some k, c u and c 2 where ydf/cf ~ independent j 2 (y) for i = 1, 2. 
Then (f>-<l>) sx.(<j>) is approximately f(v) where 

~ = (ryrf + c,d;) 2 
(cfdf/vJ + Ccfdf/v,) 


TABLE 26.3 

Various Parameter Functions, Best Estimates, and 
Estimated Standard Errors 


Parameter 

Best Estimate 

Estimated 

Pj 

Pij 


Pi. 

P- 

V(<7 f 2 + po 2 s )/pn 

Pi 

Pi 

'J(v;+°s)/tn 

Pi.-Pr. 

Pi. ~ Pi'. 

f2(o;+pol)/pn 

P.j-P-r 

Pi-P.f 

f2o 2 £ /tn 

Pj-Pr 

Pij ~ Pij' 

■\2a 2 Jn 

Uij - Pi'j 

Pij — Pn 

yl^id 2 +6l)/n 
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For the example being considered, of is the MSError(Subject) with v t = t(n - 1) degrees 
of freedom and 6\ is the MSError(Time) with v 2 = t(n - l)(p -1) degrees of freedom. 
Furthermore, for those ratios whose estimated standard error is a multiple of of + of 
then c x = 1/p and c 2 -(p- 1 )/p. 

Next consider a linear contrast of the drug main effect means, X' =1 c,/Z„. The estimated 
standard error of the estimate of such a contrast is 


(a; + pd|)Xc? 

) n P 

Thus 

t t 

Xh A. -X c <a. 

7-- ~t[t(n- 1)] 

(vl+p(t 2 s )j j cf 

i n v 


Consider a linear contrast in the time main effect means, d,p. r The estimated stan¬ 
dard error of such a contrast is 


1 



nt 


Thus 


V 

!= 1 

V 

7=1 

1 

7=1 

nt 


t[t(n-l)(p-l)] 


Such a contrast could be used to check for linear, quadratic, and similar trends over time. 

If there is a Drug x Time interaction, then we need to compare the drugs at each time 
period and/or compare time periods for each drug. That is, one could consider contrasts in 
the p t j for each possibility for j, and/or contrasts in the p n for each possibility for i. For the 
former, it can be shown that 
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is approximately t with v degrees of freedom for each j, and for the latter it can be shown that 

t t 

'Z d A)-'L d ,^i 


i=l_i=l 


\l 

i=l 


n 


is exactly t with t(n - l)(p - 1) degrees of freedom for each i. These two results can be used 
to test hypotheses and set confidence intervals on within row or within column contrasts 
of the 

The data in Table 26.4 are used to demonstrate the analyses described above. In this 
experiment there were three drugs, eight people per drug, and four time periods. The 
analysis of variance table is given in Table 26.5. There is a significant Time x Drug inter¬ 
action; thus, we need to compare times with one another for each drug and drugs with one 
another at each time point. 


TABLE 26.4 

Heart Rate Data for Example 26.1 


Person 

within 

Drug 






Drug 








AX23 



BWW9 



Control 


T 

t 2 

T, 

t 4 

T 

t 2 

T 3 

t 4 

T 

t 2 

T 3 

t 4 

1 

72 

86 

81 

77 

85 

86 

83 

80 

69 

73 

72 

74 

2 

78 

83 

88 

81 

82 

86 

80 

84 

66 

62 

67 

73 

3 

71 

82 

81 

75 

71 

78 

70 

75 

84 

90 

88 

87 

4 

72 

83 

83 

69 

83 

88 

79 

81 

80 

81 

77 

72 

5 

66 

79 

77 

66 

86 

85 

76 

76 

72 

72 

69 

70 

6 

74 

83 

84 

77 

85 

82 

83 

80 

65 

62 

65 

61 

7 

62 

73 

78 

70 

79 

83 

80 

81 

75 

69 

69 

68 

8 

69 

75 

76 

70 

83 

84 

78 

81 

71 

70 

65 

63 


Note: Tj denotes the ith time period. 


TABLE 26.5 

Analysis of Variance Table for Data Table 26.4 


Source of 
Variation 

df 

SS 

MS 

F 

EMS 

Drug 

2 

1,333.00 

666.5 

5.99 

°£ + 4(7i + Q Dmg 

Error(Person) 

21 

2,337.91 

111.33 


<?T = crl+ 4ct| 

Time 

3 

289.61 

96.54 

12.96 

Of + Qj inf 

Time x Drug 

6 

527.42 

87.90 

11.80 

Of + QorugxTime 

ErrortTime) 

63 

469.22 

7.45 


c\=o\ 


Note: Q denotes the respective noncentrality parameter. 
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To compare times with one another for each drug, the estimated standard error of the 
difference of the two means (see Table 26.3) is 

s£fji n ~ A,-,) = ^/n = V2(7.45)/8 = 1.365 

and a 5% LSD value for comparing time means within each drug is 

LSD 0.05 = fo. 025 . 63 t s.e.{ Pij — p^j)] = (2.00)(1.365) = 2.730 

Comparisons of the time means within a drug are given in Table 26.6. 

Since the levels of time are quantitative and equally spaced, orthogonal polynomials can 
be used to check for linear and quadratic trends in the response to each drug. The measure 
of the linear trend in time for the first drug is 9lti ~ -3 p n - lp 12 + lp 13 + 3 ,U| 4 , its estimate is 

§ LTi = -3(70.50) - 1(80.50) + 1(81.00) + 3(73.13) = 8.39 

and the corresponding estimated standard error is 



7.45(9 + 1 + 1 + 9) 


4.316 


The corresponding f-statistic is t c = 8.39/4.316 = 1.94. The measure of the quadratic trend 
in time for the first drug is 


0 qti — 1 /hi I/L2 1 /h 3 + f h 1 1 


its estimate is 

e QTi = lp n - 1 p 12 - lp 13 + 1 p u = 70.50 - 80.50 - 81.00 + 73.13 = -17.87 
and its estimated standard error is 



7.45(1 + 1 + 1 + 1) 


1.930 


TABLE 26.6 


Comparisons of Time Means at the Same Drug for the Data in Table 26.4 


Time 


Drug 


AX23 

BWW9 

Control 

1 

70.50 (a) 

81.75 (a, b) 

72.75 (a) 

2 

80.50 (b) 

84.00 (a) 

72.38 (a) 

3 

81.00 (b) 

78.63 (c) 

71.50 (a) 

4 

73.13 (a) 

79.75 (b, c) 

71.00 (a) 


Note: Means within a column with the same letter are not significantly different 
at the 5% significance level. LSD 005 = 2.730. 
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The corresponding f-statistic is t c = -17.87/1.930 = 9.259. 

The linear and quadratic trends in time for all drugs are summarized in Table 26.7. Drug 
BWW9 shows a negative linear trend, and drug AX23 shows a strong quadratic trend. The 
graph in Figure 26.1 displays these relationships. To compare drugs to one another at each 
time point, the estimated standard error is sle.( /q - p rj ) - V[2(d/ + <r|)/«]. To evaluate this 
quantity, one must find estimates of each of the variance components. One gets 

= MSErrorlTime ) = 7.45 


and 


2 MSError(Subject)~ d 2 e 


111.33-7.45 

4 


25.97 


TABLE 26.7 

Estimates of Linear and Quadratic Trends for the Data in Table 26.4 


for Each Drug 



Drug 


Trend 

AX23 

BWW9 

Control 

Linear 

Quadratic 

8.39 (1.94) 
-17.87 (9.259) 

-11.37 (-2.635) 
-1.13 (-0.585) 

-6.13 (-1.420) 
-0.13 (-0.067) 


Note: The values in parentheses are the corresponding t statistics. 


Estimate 



FIGURE 26.1 Means over time for each drug for the heart rate data. 
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Then 


s.e{p ij ~ 


a/ 2{al+al)/n 


^ 2 ( 7.45 + 25 . 97 ) = ^ 


The Satterthwaite estimated degrees of freedom for this estimated standard error is 

> _ Ic^i + c 2 (7 2 2 ] 2 „ [1(111.33) + f (7.45)] 2 (33.42) 2 

cffi c 2 (T 2 ^(ltl.33) 2 ^(7.45) 2 36.888 + 0.496 

v 2 v 2 21 63 


An approximate LSD at the 5% significance level to compare pairs of drug means to one 
another at each time point is 


LSD 0 . 05 — to.025,29.9 x s - e -(P,j Pi) ~ (2.042)(2.891) — 5.903 

Comparisons of the drugs to one another at each time point are given in Table 26.8. 

Suppose drugs AX23 and BWW9 are experimental and we want to compare their average 
to the control at each time point. A relevant contrast for this comparison at time 1 is 
0 = p u + p 2] ~ 2/Li- Its estimate is 6= 70.5 + 81.75 - 2 x 72.75 = 6.75, and its estimated stan¬ 
dard error is 


s.e. (0) = 


(K+o])^ 


(7.45 +25.97)(1 +1 + 4) 


8 


= 5.006 


The corresponding f-statistic is t c = 6.75/4.088 = 1.348. We fail to reject H 0 : 0=0 since 1.651 < 
Io.025,29.9 = 2.042. 

The comparisons of the means of the two experimental drugs to the control at each time 
period are given in Table 26.9. The results show that the means of the drugs are signifi¬ 
cantly different at the 5% significance level for the last three time intervals, but they are not 
different at time 1. 


TABLE 26.8 


Comparisons of Time Means at the Same Drug for the Data in Table 26.4 


Time 


Drug 


AX23 

BWW9 

Control 

1 

70.50 (a) 

81.75 (b) 

72.75 (a) 

2 

80.50 (a) 

84.00 (a) 

72.38 (b) 

3 

81.00 (a) 

78.63 (a) 

71.50 (b) 

4 

73.13 (a) 

79.75 (b) 

71.00 (a) 


Note: Means within a row with the same letter are not significantly different at the 
5% significance level. LSD 005 = 5.903. 
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TABLE 26.9 

Comparisons of the Means of Drugs AX23 and BWW9 to the 
Control at Each Time Point 





Time 


Statistic 

1 

2 

3 

4 

e 

6.75 

19.74 

16.63 

10.88 

t c 

1.35 

3.94* 

3.32* 

2.17* 


Note: ‘Denotes significance at the 5% level, t 0JH5i 29.9 = 2.042. 


26.2.2 Example 26.2: A Complex Comfort Experiment 

An engineer had three environments in which to test two types of clothing. Since responses 
to an environment also differ between males and females, sex of person was included as a 
factor. Four people (two males and two females) were put into an environmental chamber 
(which was assigned to one of the three environments). One male and one female wore 
clothing type 1, and the other male and female wore clothing type 2. The comfort score of 
each person was recorded at the end of 1, 2, and 3h. The data for this experiment are 
shown in Table 26.10. There are three sizes of experimental units. The largest experimental 
unit is a chamber or, equivalently, a group of four people. The chamber experimental unit 
experimental design is a one-way treatment structure (environment is the treatment) in a 
completely randomized design structure with three replications at each level of the 
environment. The middle-sized experimental unit is a person. The experimental design 
for a person is a two-way treatment structure (sex x clothing) in a randomized complete 
block design structure in nine blocks (each block contains four experimental units 
(people). The smallest experimental unit is a 1 h time interval, which we will call hour. The 
experimental design for hour is a one-way treatment structure (time) in a randomized 
complete block design structure in 36 blocks [each block contains three experimental units 
(1 h time intervals)]. 

The model for this experiment (where the model is separated into parts corresponding 
to the three sizes of experimental units and assuming the hour measurements satisfy 
split-plot in time assumptions) is 

Y.jkmn = b + E, + r in Chamber part 

+ S, + C k + ( SC) jk + ( ES) tj + (EC) ik + ( ESC) ljk + p ijkn Person part 

+ T m + ( ET) im + ( ST) jm + (' CT) km + ( SCT) jkm + (EST) ijm + ( ECT) ikm Hour part 
+ ( ESCT) ijkm + £ ijkmn 

where E denotes environment, S denotes sex, C denotes clothing type, T denotes time, r in 
denotes the random chamber effect assumed to be distributed i.i.d. N( 0, <7, 2 ), p ijkn denotes 
the random person effect assumed to be distributed i.i.d. N( 0, of), and e ijkmn denotes the 
measurement error for a given hour which is assumed to be i.i.d. N( 0, of). In addition, all 
the r iw p ijkn , and e ijkmn are independently distributed error terms. 

The analysis of variance table is given in Table 26.11 under the assumptions on the error 
terms given above. The F-statistics were computed by using the expected mean squares as 
a guide. 
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TABLE 26.10 

Data for Comfort Experiment in Example 26.2 


Replication 

Sex 

Clothing 

Type 

Time 


Environment 


1 

2 

3 

1 

M 

1 

1 

13.9001 

10.2881 

7.4205 

1 

M 

1 

2 

7.5154 

6.909 

7.1752 

1 

M 

1 

3 

10.9742 

8.4138 

7.1218 

1 

M 

2 

1 

18.3941 

13.8631 

12.541 

1 

M 

2 

2 

12.4151 

10.1492 

11.9157 

1 

M 

2 

3 

15.2241 

12.5372 

12.2239 

1 

F 

1 

1 

10.0149 

6.1634 

3.8293 

1 

F 

1 

2 

3.7669 

2.2837 

3.5868 

1 

F 

1 

3 

7.0326 

4.0052 

3.3004 

1 

F 

2 

1 

16.4774 

13.0291 

11.0002 

1 

F 

2 

2 

10.4104 

9.7775 

11.0282 

1 

F 

2 

3 

13.1143 

11.6576 

10.5662 

2 

M 

1 

1 

15.7185 

11.9904 

11.8158 

2 

M 

1 

2 

9.717 

8.4793 

11.9721 

2 

M 

1 

3 

12.508 

9.8694 

11.7187 

2 

M 

2 

1 

19.7547 

14.8587 

16.4418 

2 

M 

2 

2 

13.5293 

11.0317 

16.6355 

2 

M 

2 

3 

16.5487 

12.8317 

16.6686 

2 

F 

1 

1 

10.6902 

6.7562 

7.5707 

2 

F 

1 

2 

4.8473 

2.5634 

7.3456 

2 

F 

1 

3 

7.9829 

4.7547 

7.2404 

2 

F 

2 

1 

17.1147 

13.8977 

13.5421 

2 

F 

2 

2 

11.3858 

9.6643 

13.5672 

2 

F 

2 

3 

14.1502 

11.6034 

14.024 

3 

M 

1 

1 

14.9015 

9.7589 

7.2364 

3 

M 

1 

2 

9.1825 

6.1772 

7.8304 

3 

M 

1 

3 

11.5819 

8.0785 

7.4147 

3 

M 

2 

1 

18.0402 

13.5513 

12.0689 

3 

M 

2 

2 

12.1004 

9.3052 

12.5003 

3 

M 

2 

3 

15.4893 

11.5259 

12.179 

3 

F 

1 

1 

10.1944 

4.5203 

1.8330 

3 

F 

1 

2 

4.1716 

0.5913 

1.6769 

3 

F 

1 

3 

6.9688 

2.8939 

1.8065 

3 

F 

2 

1 

16.0789 

12.5057 

9.4934 

3 

F 

2 

2 

10.2357 

7.7502 

9.7000 

3 

F 

2 

3 

12.4853 

10.5226 

10.0119 


The error sums of squares in Table 26.11 were computed as 

SSERROR(PERSON) = Replication x Sex x Clothing(Environment ) SS 
+ Replication x Sex(Environment ) SS 
+ Replication x Clothing(Environment) SS 
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TABLE 26.11 


Analysis of Variance Table for Example 26.2 


Source of Variation 

df 

SS 

MS 

F 

Expected Mean Square 

Environment 

2 

191.69 

95.85 

3.28 

ol + 3oj + 12<t,2 + Q 1 

Error(Chamber) 

6 

175.26 

29.21 


<jJ + 3cJp + 12<j r 2 

Sex 

1 

289.46 

289.46 

501.89 

<r? + 3cTp + Q 2 

Clothing 

1 

806.11 

806.11 

1,397.70 

o 2 e + 3o 2 p + Q 3 

Sex x Clothing 

1 

55.96 

55.96 

97.04 

<r? + 3cjp + Q 4 

Environment x Sex 

2 

0.78 

0.39 

0.68 

<t c 2 + 3cr^ + Q 5 

Environment x Clothing 

2 

4.31 

2.16 

3.73 

C7 2 + 3C7 2 + Q 6 

Environment x Sex x Clothing 

2 

4.41 

2.21 

3.82 

CT 2 + 3C7 2 + Q 7 

Error(Person) 

18 

10.38 

0.58 


<7 2 + 3c7 2 

Time 

2 

194.6 

97.3 

1,672.24 


Time x Environment 

4 

111.65 

27.91 

479.7 


Time x Sex 

2 

0.08 

0.04 

0.67 

ff?+Qio 

Time x Clothing 

2 

0.08 

0.04 

0.71 

tf?+Qu 

Time x Sex x Clothing 

2 

0.2 

0.1 

1.68 


Time x Environment x Sex 

4 

0.18 

0.04 

0.78 

r>i + Ql3 

Time x Environment x Clothing 

4 

0.26 

0.06 

1.14 

ff?+Ql4 

Time x Environment x Sex x Clothing 

4 

0.17 

0.04 

0.71 

ff?+Ql5 

Error(Hour) 

48 

2.79 

0.06 


0 2 e 


Notes: All figures in the table are rounded to two significant figures, but the calculations were done in 
double precision. Q, denotes the noncentrality parameter corresponding to the given effect. 


and 


SSERROR(CHAMBER) = Chamber(Environment) SS 

The next step in the analysis is to make the needed comparisons. If one selects a = 0.01 as 
the probability of a type I error, then there are two significant interactions. Environment x Time 
and Sex x Clothing. For the Sex x Clothing interaction, one will want to compare the four 
Sex x Clothing means. Since both treatments were applied to the same size of experimental 
unit (person), only one standard error needs to be computed. 

Let 


Vijkm - If + E; + Sj + C k + ( SC) jk + ( ES) jj + ( EC) ik + ( ESC) ijk + T m + ( ET) im 
+ (< ST) jm + ( CT) km + (SCT) jkm + (1EST) ijm + (1 ECT) ikm + (ESCT) ijkm 

The differences jj ., ;7t . - jj. rk -. can be estimated by Y jk- ~ Y.jv.. ar| d the estimated standard 
error of Y. ]k _. - Y,y r .) is given by 

s7\(Y . - Y ; ,- ) = l —OR(PMsm = B = Q 207 

v .Jk.. \ q.q a A 97 


with 18 degrees of freedom. Fisher's LSD at the 1% significance level is 


(Wis-) ^.(V - v••) = 2 - 878 ■ °- 207 = 0596 
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TABLE 26.12 

Sex x Clothing Means with LSD for Example 26.2 


Sex 

Clothing 

Mean Score 

M 

1 

9.839 

M 

2 

13.864 

F 

1 

5.126 

F 

2 

12.029 


Note: LSD 001 = 0.595. All means are based on 27 observations. 


Table 26.12 has the Sex x Clothing means, all of which are significantly different from one 
another. 

The Environment x Time interaction involves two different sizes of experimental units; 
thus, there are two types of comparisons. First, one can compare two time means for each 
environment, and one can check for linear and quadratic trends between the time means 
for each environment. Second, one can compare the environments to one another at each 
time point. 

The standard error of a comparison of two time means at the same environment is 


s£fY l .. m -Y l .. m ,.) = s.e.[(r r -r i .)+(p i ..-p i ...)+(e r . m -e l .. m ,.)] = s.e.[(e r . m .-£ l .. m ,.)] = 


2-2-3 


and the 




2<t: 


2-2-3 


1 2(0.06) 

V 12 


0.01 


The 1% LSD value for comparing two times in the same environment is fooo54s(0-01) = 
2.682(0.01) = 0.268. 

Table 26.13 contains the Environment x Time means and comparisons among times for 
each environment. 


TABLE 26.13 


Environment x Time Means for Comparing Time 
Means within Each Environment for Example 26.2 


Time 


Environment 


1 

2 

3 

1 

15.11 (a) 

10.93 (a) 

9.57 (a) 

2 

9.11 (c) 

7.07 (c) 

9.57 (a) 

3 

12.01 (b) 

9.06 (b) 

9.52 (a) 


Note: Within a given environment (column), time means 
with the same letter are not significantly different; 
LSD 001 = 0.268. 
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Since the three levels of time are equally spaced, orthogonal polynomials can be used to 
measure linear and quadratic trends over time_for each environment. The linear trend at 
environment 1 is measured by d LTi = -Y,.,,. + 0Yj.. 2 . + Y,.. 3 . = -15.11 + 12.01 = -3.10, and its 
estimated standard error is 


s.e. (0 LTi ) — 


<[(-i r + (in _ (o.o6)(2) 


12 


12 


= 0.1 


The corresponding f-statistic is t c = -3.10/0.1 = -31.0 with 48 degrees of freedom. The qua¬ 
dratic trend across time points at environment 1 is measured by d QTi = 1Y,..,. - 2Y,.. 2 . + lY,.^. = 
15.11 - 2(9.11) + 12.01 = 8.90. Its estimated standard error is given by 


s.e. ) 



[(l) 2 + (-2) 2 + (l) 2 ] 

12 


= 0.173 

V 12 


and the corresponding f-statistic is t c = 8.90/0.173 = -51.45 with 48 degrees of freedom. 

The linear and quadratic trends for each environment are given in Table 26.14. There are 
significant linear and quadratic trends for environments 1 and 2, but none for environ¬ 
ment 3. Figure 26.2 displays the relationships. 

The second type of Environment x Time comparison is to compare the different environ¬ 
ments at the same time or at different times. The standard error of such a comparison is 


s-e- (X-m- - X'.„,)= s-e.[(r, - r i .)+(p i .. m - p ,.- £,..„.)] 


2c 2 


2(7 2 


2(7 2 


2-2-3 2-2-3 


= + 4(7,:) 


The quantity o 2 e + aj, + 4a, 2 can be estimated by (1/3) MSERROR(CHAMBER) + (2/3) 
MSERROR(HOUR). Therefore the Satterthwaite estimated degrees of freedom is 

. = [ Cl (7 2 + c 2< 7 2 ] 2 _ [|(29.21) + |(0.06)] 2 _ (9.777) 2 _ Q5 

cfa\ c\a[ |(29.21) 2 |(0.06) 2 15.800 + 0.006 

Vi v 2 6 48 


TABLE 26.14 


Measures of Linear and Quadratic Trends for Each Environment 
in Example 26.2 




Environment 


Trend 

1 

2 

3 

Linear 

-3.10 (-31.0) 

-1.87 (-18.70) 

-0.05 (-0.5) 

Quadratic 

8.90 (51.45) 

5.85 (33.81) 

-0.05 (-0.289) 


Note: f-values are in parentheses. 
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Estimate 



Env • • • 1 0 0 0 2 &-A-A 3 

FIGURE 26.2 Response over time at each environment for the comfort data. 


Since 


2[| MSERROR(Chamber) + f MSERROR(Hour)] _ 2[i(29.21) + f (0.06)] 


12 


12 


= 1.276 


a 1% LSD for comparing environments within each time point is LSD 001 = t 00056 (1.276) = 
(3.707)(1.276) = 4.73. Table 26.15 contains the Environment x Time means with Fisher LSD 
comparisons made between different environments at the same times. This example illus¬ 
trates why it is very important to use correct error terms when comparing within a set 
of interaction means: The two LSD values may be extremely different. In this example, 
the two values are 0.268 and 4.73. Thus, the LSD 001 value for comparing environments at 
the same level of time is more than 17 times larger than the LSD 001 value for comparing 
times within the same environment. 


TABLE 26.15 


Environment x Time Means with Comparisons 
between Environment Means at Each Time Point 


Time 


Environment 


1 

2 

3 

1 

15.11 (a) 

10.93 (ab) 

9.57 (b) 

2 

9.11 (a) 

7.07 (a) 

9.57 (a) 

3 

12.01 (a) 

9.06 (a) 

9.52 (a) 


Note: Within a given time (row), environment means with the 
same letter are not significantly different; LSD 001 = 4.73. 
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To demonstrate how to use the information in Section 24.3 to compute standard errors 
and corresponding f-statistics for cases where more than two sizes of experimental units 
are used, we construct an LSD for comparing environments and sexes within the same 
time point. Suppose the comparison of interest is pi n . x - p 2 2 -i- The best estimate of this is 
Y 1M . - Y 22 .i. = 8.04. Next note that the variance of this estimate is 


Var(y tl . 1 . C 2 .i.) Var[(q. r 2 .) + (Pn.. Vn--) (®u.. ^ 22 --)] 

2 < t 2 2 a 2 2 a] 2 . 2 2 _ 2 , 

= — 51 + —~ + —- = -(<7 f + cr + 2(7, ) 

3 2-3 2-3 6 e v 


This quantity inside the parentheses is estimated by 

(1/6 )MSERROR(CHAMBER) + (1/6 )MSERROR(PERSON) 

+ (2/3)MSERROR(HOUR) = (1/6)(29.21) + (l/6)(0.58) + (2/3)(0.06) = 5.005 

so the estimated standard error of y u4 . - Y 22 . 1 . is s^el(Y n . h - y 22 .j.) = N [2(5.005)/6] = 1.2916. The 
Satterthwaite estimated degrees of freedom associated with this estimated standard error is 

[(|)(29.21) + (|)(0.58)+ (t)(0.06)] 2 _ (5.005) 2 25.05 

(|) 2 (29.21) 2 (|) 2 (0.58) 2 (|) 2 (0.06) 2 3.9501 + 0.0005 + 0.00003 3.9506 

6 + 18 + 48 

The methods used to compute the above two standard errors can be applied to other 
situations where a given comparison can be partitioned into the sum of components where 
each component is a comparison of only one size of experimental unit. 


26.2.3 Example 26.3: Family Attitudes 

The attitudes of families from rural and urban environments were measured every six 
months for three time periods. The data were obtained from seven rural families and 10 
urban families, each family consisting of a son, father, and mother which are displayed in 
Table 26.16. The model used to describe the data was previously given in Equation 26.2. 
The analysis in this section assumes that the ideal conditions that were given in Section 
26.1 are satisfied. Chapter 27 will consider more general analyses that do not require that 
the ideal conditions be satisfied. 

The analysis of variance table corresponding to the model in Equation 26.2 is given 
in Table 26.17. If we operate at a = 0.05, there are four significant effects: Area, Person, Time, and 
Area x Time. 

Thus, there are three comparisons we wish to make: between persons, between times for 
each area, and between areas for each time. 

The variance of a comparison between family members is 


Var(y , - y , ) = Var[(// - /..) + (p. - p ,) + (e ... - £ ,..)] 


2 2 2 

—cr +- 

17 p 3-17 


a 2 = —(cr 2 + 3cr 2 ) 

e \ e p / 
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TABLE 26.16 

Data for Family Attitude Study of Example 26.3 


Family 




Person 






Son 



Father 



Mother 


T 

t 2 

t 3 

T, 

t 2 

t 3 

T, 

t 2 

t 3 

Urban 










I 

17 

17 

19 

18 

19 

21 

16 

16 

18 

2 

12 

i4 

15 

19 

19 

21 

16 

16 

18 

3 

8 

10 

11 

16 

18 

19 

11 

12 

12 

4 

5 

7 

7 

12 

12 

13 

13 

14 

14 

S 

2 

5 

6 

12 

14 

14 

14 

16 

18 

6 

9 

11 

11 

16 

17 

18 

14 

15 

16 

7 

8 

9 

9 

19 

20 

20 

15 

16 

18 

8 

13 

14 

16 

16 

17 

18 

18 

18 

20 

9 

11 

12 

13 

13 

16 

17 

7 

8 

10 

10 

19 

20 

20 

13 

15 

15 

11 

12 

12 

Rural 










1 

12 

11 

14 

18 

19 

22 

16 

16 

19 

2 

13 

13 

17 

18 

19 

22 

16 

16 

19 

3 

12 

13 

16 

19 

18 

22 

17 

16 

20 

4 

18 

18 

21 

23 

23 

26 

23 

22 

26 

5 

15 

14 

16 

15 

15 

19 

17 

17 

20 

6 

6 

6 

10 

15 

16 

19 

18 

19 

21 

7 

16 

17 

18 

17 

17 

21 

18 

20 

23 


TABLE 26.17 


Analysis of Variance Table for Family Attitude Data of Example 26.3 


Source of Variation 

df 

SS 

MS 

F 

EMS 

Area 

1 

382.12 

382.122 

7.06 

01 + 30$ + + Qj 

Error(Family) 

15 

812.32 

54.155 


o 2 + 3 o$ + 9of 

Person 

2 

666.84 

330.92 

13.5 

o% + 3o$ + Q 2 

Person x Area 

2 

60.27 

30.135 

1.23 

o 2 e + 3o$ + Q 3 

Error(Person) 

30 

735.37 

25.512 


o 2 e + 3o$ 

Time 

2 

204.28 

102.14 

268.94 

ol+Qi 

Time x Area 

2 

30.95 

15.475 

40.74 

ol + Q 5 

Time x Person 

4 

1.19 

0.298 

0.78 

o 2 e +Q 6 

Time x Person x Area 

4 

1.5 

0.375 

0.99 

o 2 e +Q 7 

ErroriTime) 

90 

34.18 

0.37 


o 2 


Note: Q, denotes the noncentrality parameter corresponding to the respective sums 
of squares. 
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The estimate of the standard error of the difference of two family member means is 


s-e.(y.j..-y . r ..) 



MSError (Person) 



(25.512) = 0.9906 


A 5% LSD value is LSD 0 05 = 2.042(0.99) = 2.022. Table 26.18 has a summary of the pairwise 
comparisons. The variance of the difference of two time means within a specified area is 


Var (y,. t . -y,k'.) = Var [(A -/*. )+(p,.. - Vi.. )] 


2 

3 n t 


a 


2 

e 


where n, is the number of families in area i. 

The estimate of the standard error to compare two times in the urban area is 

MFi.*. - Fi.r.) = = ^(0.370) = 0.157 

and the standard error to compare two times in a rural area is 

^.(y 2 . k . - y 2 . k '.) = = ^-(0.370) = 0.188 

The corresponding 5% LSDs are LSD 0 05 (Urban) = 1.987(0.157) = 0.312 and LSD 005 (Rural) = 
1.987(0.188) = 0.374. The multiple comparisons are summarized in Table 26.19. 


TABLE 26.18 

Comparison of Person Means for Example 26.3 

Person 

Mean 

Son 

12.67 (a) 

Father 

17.47 (b) 

Mother 

16.53 (b) 


Note: Means within a column with the same letter are 
not significantly different. LSD 0 05 = 2.022. 


TABLE 26.19 


Comparison of Time Means within Each Area 
for Example 26.3 


Time 

Area 

Rural 

Urban 

1 

16.33 (a) 

13.10 (a) 

2 

16.38 (a) 

14.30 (b) 

3 

19.61 (b) 

15.30 (c) 


Note: Means within a column with the same letter 
are not significantly different. LSD 005 = 0.374 
for Rural and LSD 005 = 0.312 for Urban. 
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TABLE 26.20 


Comparison of Area Means at Each Time for 
Example 26.3 


Time 

Area 

Rural 

Urban 

1 

16.33 (a) 

13.10 (b) 

2 

16.38 (a) 

14.30 (a) 

3 

19.61 (a) 

15.30 (b) 


Note: Means within a row with the same letter are 
not significantly different. LSD 005 = 2.59. 


Next the variance of the difference between urban and rural means at a given time 
point is 


Var(yj. t . -y 2 . J = Var^. -f 2 .) + (p,.. -PzJ+fe.*. ~£ 2 . k .)] 

i 2 ( i n 2 f 1 

7) p 13-10 3-7 J @3-10 


= o} \ — 

f uo 


1 

3-7 


— + — \(al+al+3a 2 f ) 
30 21 1 E p f 


The function of the variance components in the above expression can be estimated by 
(1/3 )MSError(Family) + (2/3 )MSError{Time) = (1/3)(54.155) + (2/3)(0.370) = 18.2983 
Therefore, the estimated standard error is 


sAy, k . - y* J = + 018.2983) = 1.217 

and the Satterthwaite estimated degrees of freedom corresponding to this estimate is 

[@(54.155) + (|)(0.370)] 2 = (18.2983) 2 = 334.8278 = 

(|) z (54.155) 2 (|) 2 (0.370) 2 21.7242 + 0.0007 21.7249 

15 + 90 


Thus an approximate 5% LSD value for comparing urban to rural at a given time point 
is LSD 005 = 2.131(1.217) = 2.59. The multiple comparisons are summarized in Table 26.20. 
The linear and quadratic trends of time in each area can be investigated using the method 
described in Example 26.1. 


26.3 Data Analyses Using the SAS-Mixed Procedure 

This section illustrates SAS®-Mixed analyses of each of the three examples described in 
the preceding section. Consider the data in Table 26.4. To reproduce the results shown in 
Section 26.2, one can use the SAS commands shown in Table 26.21. 
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TABLE 26.21 

SAS-Mixed Code to Analyze the Data in Table 26.4 


DATA HRT_RATE ; 

INPUT Drug $ PERSON HR1-HR4 @@; 
CARDS; 


AX23 

1 

72 

86 

81 

77 

BWW9 

2 

85 

86 

83 

80 

CTRL 

3 

69 

73 

72 

74 

AX23 

4 

78 

83 

88 

81 

BWW9 

5 

82 

86 

80 

84 

CTRL 

6 

66 

62 

67 

73 

AX23 

7 

71 

82 

81 

75 

BWW9 

8 

71 

78 

70 

75 

CTRL 

9 

84 

90 

88 

87 

AX23 

10 

72 

83 

83 

69 

BWW9 

11 

83 

88 

79 

81 

CTRL 

12 

80 

81 

77 

72 

AX23 

13 

66 

79 

77 

66 

BWW9 

14 

86 

85 

76 

76 

CTRL 

15 

72 

72 

69 

70 

AX23 

16 

74 

83 

84 

77 

BWW9 

17 

85 

82 

83 

80 

CTRL 

18 

65 

62 

65 

61 

AX23 

19 

62 

73 

78 

70 

BWW9 

20 

79 

83 

80 

81 

CTRL 

21 

75 

69 

69 

68 

AX23 

22 

69 

75 

76 

70 

BWW9 

23 

83 

84 

78 

81 

CTRL 

24 

71 

70 

65 

63 


DATA; SET HRT_RATE; DROP HR1-HR4; 

Time =1; HR=HR1; OUTPUT; Time=2; HR=HR2; OUTPUT; Time=3;HR=HR3; 
Time =4; HR=HR4; OUTPUT; 

RUN ; 


PROC MIXED; 

CLASSES Drug Time PERSON; 

MODEL HR=Drug Time Drug*Time/DDFM=SATTERTH ; 
RANDOM PERSON(Drug); 

LSMEANS Drug|Time/PDIFF; 

ODS OUTPUT LSMEANS=LSMS; 

RUN ; 


PROC PRINT DATA = LSMS ; 

RUN ; 

SYMBOL1 V=DOT COLOR=BLACK I=JOIN; 

SYMBOL2 V=CIRCLE COLOR=BLACK I=JOIN; 

SYMBOL3 V=TRIANGLE COLOR=BLACK I=JOIN; 

PROC GPLOT DATA=LSMS; WHERE EFFECT='Drug*Time'; 
PLOT ESTIMATE*Time=Drug/VAXIS=7 0 TO 90 BY 5; 

RUN ; 

QUI T ; 


TABLE 26.22 


Covariance Parameter Estimates for the Data 
in Table 26.4 

Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Person(Drug) 

25.9702 

Residual 

7.4479 


OUTPUT; 
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TABLE 26.23 


Statistical Tests for Drug, Time, and Drug x Time 
Effects for the Data in Table 26.4 


Effect 

Type III Tests of Fixed Effects 

Num df 

Den df 

F-Value 

Pr>F 

Drug 

2 

21 

5.99 

0.0088 

Time 

3 

63 

12.96 

<0.0001 

Drug x Time 

6 

63 

11.80 

<0.0001 


26.3.1 Example 26.1 

One portion of the output that should be noted is the portion that provides estimates of the 
two variance components, o 2 e and c|. The estimates are given in Table 26.22. One can see 
that <t| = 25.97 and aj = 7.45. 

Tests corresponding to the ANOVA in Table 26.5 are given in Table 26.23. Note that 
the MIXED procedure does not provide error sums of squares, but it does give the same 
F-statistics as given in Table 26.5. 

Drug main effect means. Time main effect means, and the Drug x Time two-way means 
are given in Table 26.24 and pairwise comparisons among various subsets of these means 
are given in Tables 26.25-26.27. Estimate options that can be used in the MIXED procedure 
to compute linear and quadratic contrasts in time for each drug are given in Table 26.28. 
The results of these options are shown in Table 26.29. 


TABLE 26.24 


Drug, Time and Drug x Time Means for the Data in Table 26.4 


Least Squares Means 

Effect E)rug 

Time 

Estimate 

Standard Error 

df 

t -Value 

Pr> |f| 

Drug 

AX23 


76.2813 

1.8652 

21 

40.90 

<0.0001 

Drug 

BWW9 


81.0312 

1.8652 

21 

43.44 

<0.0001 

Drug 

CTRL 


71.9062 

1.8652 

21 

38.55 

<0.0001 

Time 


1 

75.0000 

1.1800 

29.9 

63.56 

<0.0001 

Time 


2 

78.9583 

1.1800 

29.9 

66.91 

<0.0001 

Time 


3 

77.0417 

1.1800 

29.9 

65.29 

<0.0001 

Time 


4 

74.6250 

1.1800 

29.9 

63.24 

<0.0001 

Drug x Time 

AX23 

1 

70.5000 

2.0438 

29.9 

34.49 

<0.0001 

Drug x Time 

AX23 

2 

80.5000 

2.0438 

29.9 

39.39 

<0.0001 

Drug x Time 

AX23 

3 

81.0000 

2.0438 

29.9 

39.63 

<0.0001 

Drug x Time 

AX23 

4 

73.1250 

2.0438 

29.9 

35.78 

<0.0001 

Drug x Time 

BWW9 

1 

81.7500 

2.0438 

29.9 

40.00 

<0.0001 

Drug x Time 

BWW9 

2 

84.0000 

2.0438 

29.9 

41.10 

<0.0001 

Drug x Time 

BWW9 

3 

78.6250 

2.0438 

29.9 

38.47 

<0.0001 

Drug x Time 

BWW9 

4 

79.7500 

2.0438 

29.9 

39.02 

<0.0001 

Drug x Time 

CTRL 

1 

72.7500 

2.0438 

29.9 

35.59 

<0.0001 

Drug x Time 

CTRL 

2 

72.3750 

2.0438 

29.9 

35.41 

<0.0001 

Drug x Time 

CTRL 

3 

71.5000 

2.0438 

29.9 

34.98 

<0.0001 

Drug x Time 

CTRL 

4 

71.0000 

2.0438 

29.9 

34.74 

<0.0001 
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TABLE 26.25 


Pairwise Comparisons among Drug Main Effect Means and Time Main Effect Means for the Data 
in Table 26.4 


Differences of Least Squares Means 

Effect Drug Time Drug 

Time 

Estimate 

Standard Error 

df 

t- Value 

Pr > | f | 

Drug 

AX23 


BWW9 


-4.7500 

2.6378 

21 

-1.80 

0.0861 

Drug 

AX23 


CTRL 


4.3750 

2.6378 

21 

1.66 

0.1121 

Drug 

BWW9 


CTRL 


9.1250 

2.6378 

21 

3.46 

0.0023 

Time 


1 


2 

-3.9583 

0.7878 

63 

-5.02 

<0.0001 

Time 


1 


3 

-2.0417 

0.7878 

63 

-2.59 

0.0119 

Time 


1 


4 

0.3750 

0.7878 

63 

0.48 

0.6357 

Time 


2 


3 

1.9167 

0.7878 

63 

2.43 

0.0178 

Time 


2 


4 

4.3333 

0.7878 

63 

5.50 

<0.0001 

Time 


3 


4 

2.4167 

0.7878 

63 

3.07 

0.0032 


TABLE 26.26 

Comparisons of Time Means within Each Drug for the Data in Table 26.4 

Differences of Least Squares Means 

Effect Drug Time Drug 

Time 

Estimate 

Standard Error 

df 

t-Value 

Pr > |*| 

Drug x Time 

AX23 

1 

AX23 

2 

-10.0000 

1.3645 

63 

-7.33 

<0.0001 

Drug x Time 

AX23 

1 

AX23 

3 

-10.5000 

1.3645 

63 

-7.69 

<0.0001 

Drug x Time 

AX23 

1 

AX23 

4 

-2.6250 

1.3645 

63 

-1.92 

0.0589 

Drug x Time 

AX23 

2 

AX23 

3 

-0.5000 

1.3645 

63 

-0.37 

0.7153 

Drug x Time 

AX23 

2 

AX23 

4 

7.3750 

1.3645 

63 

5.40 

<0.0001 

Drug x Time 

AX23 

3 

AX23 

4 

7.8750 

1.3645 

63 

5.77 

<0.0001 

Drug x Time 

BWW9 

1 

BWW9 

2 

-2.2500 

1.3645 

63 

-1.65 

0.1041 

Drug x Time 

BWW9 

1 

BWW9 

3 

3.1250 

1.3645 

63 

2.29 

0.0254 

Drug x Time 

BWW9 

1 

BWW9 

4 

2.0000 

1.3645 

63 

1.47 

0.1477 

Drug x Time 

BWW9 

2 

BWW9 

3 

5.3750 

1.3645 

63 

3.94 

0.0002 

Drug x Time 

BWW9 

2 

BWW9 

4 

4.2500 

1.3645 

63 

3.11 

0.0028 

Drug x Time 

BWW9 

3 

BWW9 

4 

-1.1250 

1.3645 

63 

-0.82 

0.4128 

Drug x Time 

CTRL 

1 

CTRL 

2 

0.3750 

1.3645 

63 

0.27 

0.7844 

Drug x Time 

CTRL 

1 

CTRL 

3 

1.2500 

1.3645 

63 

0.92 

0.3631 

Drug x Time 

CTRL 

1 

CTRL 

4 

1.7500 

1.3645 

63 

1.28 

0.2044 

Drug x Time 

CTRL 

2 

CTRL 

3 

0.8750 

1.3645 

63 

0.64 

0.5237 

Drug x Time 

CTRL 

2 

CTRL 

4 

1.3750 

1.3645 

63 

1.01 

0.3175 

Drug x Time 

CTRL 

3 

CTRL 

4 

0.5000 

1.3645 

63 

0.37 

0.7153 
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TABLE 26.27 


Comparisons of Drug Means within Each Time for the Data in Table 26.4 


Differences of Least Squares Means 

Effect Drug Time Drug 

Time 

Estimate 

Standard 

Error 

df 

f-Value 

Pr > 1 1 1 

Drug x Time 

AX23 

1 

BWW9 

1 

-11.2500 

2.8904 

29.9 

-3.89 

0.0005 

Drug x Time 

AX23 

1 

CTRL 

1 

-2.2500 

2.8904 

29.9 

-0.78 

0.4424 

Drug x Time 

AX23 

2 

BWW9 

2 

-3.5000 

2.8904 

29.9 

-1.21 

0.2354 

Drug x Time 

AX23 

2 

CTRL 

2 

8.1250 

2.8904 

29.9 

2.81 

0.0086 

Drug x Time 

AX23 

3 

BWW9 

3 

2.3750 

2.8904 

29.9 

0.82 

0.4178 

Drug x Time 

AX23 

3 

CTRL 

3 

9.5000 

2.8904 

29.9 

3.29 

0.0026 

Drug x Time 

AX23 

4 

BWW9 

4 

-6.6250 

2.8904 

29.9 

-2.29 

0.0291 

Drug x Time 

AX23 

4 

CTRL 

4 

2.1250 

2.8904 

29.9 

0.74 

0.4680 

Drug x Time 

BWW9 

1 

CTRL 

1 

9.0000 

2.8904 

29.9 

3.11 

0.0041 

Drug x Time 

BWW9 

2 

CTRL 

2 

11.6250 

2.8904 

29.9 

4.02 

0.0004 

Drug x Time 

BWW9 

3 

CTRL 

3 

7.1250 

2.8904 

29.9 

2.47 

0.0197 

Drug x Time 

BWW9 

4 

CTRL 

4 

8.7500 

2.8904 

29.9 

3.03 

0.0050 


TABLE 26.28 

Estimate Options to Compute Linear and Quadratic Contrasts in Time for Each 
Drug for the Data in Table 26.4 


ESTIMATE 'TIME LINEAR FOR AX23' 
00000000 ; 

ESTIMATE 'TIME LINEAR FOR BWW9 
-3-1130000; 

ESTIMATE 'TIME LINEAR FOR CNTL 
0000-3-113; 

ESTIMATE 'TIME QUAD FOR AX23' 
00000000 ; 

ESTIMATE 'TIME QUAD FOR BWW9 ' 
1 - 1 - 110000 ; 

ESTIMATE 'TIME QUAD FOR CNTL' 
00001 - 1 - 11 ; 


TIME -3-113 DRUG*TIME -3-113 

TIME -3-113 DRUG*TIME 0000 

TIME -3-113 DRUG*TIME 0000 

TIME 1-1-11 DRUG*TIME 1-1-11 

TIME 1-1-11 DRUG*TIME 0000 

TIME 1-1-11 DRUG*TIME 0000 


TABLE 26.29 


Linear and Quadratic Contrasts in Time for Each Drug for the Data in Table 26.4 


Estimates 

Label 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

Time linear for AX23 

8.3750 

4.3151 

63 

1.94 

0.0568 

Time linear for BWW9 

-11.3750 

4.3151 

63 

-2.64 

0.0105 

Time linear for CNTL 

-6.1250 

4.3151 

63 

-1.42 

0.1607 

Time quad for AX23 

-17.8750 

1.9298 

63 

-9.26 

<0.0001 

Time quad for BWW9 

-1.1250 

1.9298 

63 

-0.58 

0.5620 

Time quad for CNTL 

-0.1250 

1.9298 

63 

-0.06 

0.9486 
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TABLE 26.30 

SAS Commands to Analyze the Complex Comfort Experiment 

PROC MIXED DATA=COMFORT; 

CLASSES ENV REP SEX CLO TIME; 

MODEL SCORE= ENV | SEX | CLO | TIME/DDFM = SATTERTH ; 

RANDOM REP(ENV) SEX*CLO*REP(ENV); 

LSMEANS SEX*CLO ENV*TIME /PDIFF; 

ESTIMATE 'LINEAR TIME FOR ENV 1 ' TIME -101 ENV*TIME -101000000; 

ESTIMATE 'LINEAR TIME FOR ENV 2' TIME -101 ENV*TIME 000-101000; 

ESTIMATE 'LINEAR TIME FOR ENV 3' TIME -101 ENV*TIME 000000-101; 

ESTIMATE 'QUAD TIME FOR ENV 1 ' TIME 1-21 ENV*TIME 1-21000000; 

ESTIMATE 'QUAD TIME FOR ENV 2' TIME 1-2 1 ENV*TIME 0001 -2 1000; 

ESTIMATE 'QUAD TIME FOR ENV 3' TIME 1-2 1 ENV*TIME 0000001 -2 1; 

ESTIMATE 'ET11 VERSUS ET22' ENV 1-10 TIME 1-10 ENV*TIME 1000-10000; 

ODS OUTPUT LSMEANS = LSMS DIFFS=LSMDIFFS ; 

RUN ; 

DATA LSMDIFFSl; SET LSMDIFFS; DROP SEX CLO _SEX _CLO; 

PROC PRINT DATA=LSMDIFFS1; WHERE TIME=_TIME AND EFFECT='ENV*TIME'; 

PROC PRINT DATA=LSMDIFFS1; WHERE ENV=_ENV AND EFFECT='ENV*TIME'; 

DATA LSMDIFFS2; SET LSMDIFFS; DROP ENV TIME _ENV _TIME; 

PROC PRINT DATA=LSMDIFFS2; WHERE EFFECT='SEX*CLO'; 

RUN ; 

SYMBOL1 V=DOT I=JOIN COLOR=BLACK; 

SYMBOL2 V=CIRCLE I=JOIN COLOR=BLACK; 

SYMBOL3 V=TRIANGLE I=JOIN COLOR=BLACK; 

PROC GPLOT DATA=LSMS; WHERE EFFECT='ENV*TIME'; 

PLOT ESTIMATE*TIME=ENV; 

RUN ; 


26.3.2 Example 26.2 

This section provides MIXED analyses of the Complex Comfort Experiment. The data in 
Example 26.2 were analyzed using the SAS commands shown in Table 26.30. The estimates 
of each of the three variance components corresponding to chamber, person, and time are 
given in Table 26.31. The type III tests of fixed effects are given in Table 26.32. Linear and 
quadratic contrasts for time within each environment are given in Table 26.33. The table 
also contains a comparison of Environment 1 at Time 1 to Environment 2 at Time 2 to 

TABLE 26.31 

Estimates of the Chamber, Person, and Time 
Variance Components for the Comfort Study 


Covariance Parameter Estimates 

Covariance Parameter 

Estimate 

Rep(Env) 

2.3861 

Rep x Sex x Clo(Env) 

0.1729 

Residual 

0.05819 
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TABLE 26.32 

Type III Tests of Fixed Effects for the Complex Comfort Experiment 


Type III Tests of Fixed Effects 

Effect 

Num df 

Den df 

F-value 

Pr>F 

Env 

2 

6 

3.28 

0.1090 

Sex 

1 

18 

501.89 

<0.0001 

Env x Sex 

2 

18 

0.68 

0.5201 

Clo 

1 

18 

1397.70 

<0.0001 

Env x Clo 

2 

18 

3.73 

0.0440 

Sex x Clo 

1 

18 

97.04 

<0.0001 

Env x Sex x Clo 

2 

18 

3.82 

0.0414 

Time 

2 

48 

1672.23 

<0.0001 

Env x Time 

4 

48 

479.70 

<0.0001 

Sex x Time 

2 

48 

0.67 

0.5164 

Env x Sex x Time 

4 

48 

0.78 

0.5449 

Clo x Time 

2 

48 

0.71 

0.4954 

Env x Clo x Time 

4 

48 

1.14 

0.3499 

Sex x Clo x Time 

2 

48 

1.68 

0.1978 

Env x Sex x Clo x Time 

4 

48 

0.71 

0.5896 


TABLE 26.33 

Linear and Quadratic Contrasts for Time within Each Environment 
for the Comfort Study 

Estimates 


Standard 


Label 

Estimate 

Error 

df 

f-Value 

Pr> |f| 

Linear time for Env 1 

-3.1016 

0.09848 

48 

-31.50 

<0.0001 

Linear time for Env 2 

-1.8741 

0.09848 

48 

-19.03 

<0.0001 

Linear time for Env 3 

-0.04309 

0.09848 

48 

-0.44 

0.6637 

Quad time for Env 1 

8.8988 

0.1706 

48 

52.17 

<0.0001 

Quad time for Env 2 

5.8761 

0.1706 

48 

34.45 

<0.0001 

Quad time for Env 3 

-0.06653 

0.1706 

48 

-0.39 

0.6982 

ET11 vs ET22 

8.0498 

1.2764 

6.05 

6.31 

0.0007 


illustrate such a comparison should it be of interest to someone. Table 26.34 gives the 
Sex x Clothing means and the Environment x Time means. Tables 26.35-26.37 provide pair¬ 
wise comparison between various subsets of these means. 


26.3.3 Example 26.3 

This section provides MIXED analyses of the family member experiment. The data in 
Example 26.3 were analyzed using the SAS commands shown in Table 26.38. The estimates 
of each of the three variance components corresponding to area, family member, and time 
are given in Table 26.39. The type III tests of fixed effects are given in Table 26.40. Table 26.41 
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TABLE 26.34 


Sex x Clothing Means and Environment x Time Means for the Comfort 
Study 


Least Squares Means 

Effect Env Sex 

Clo 

Time 

Estimate 

Standard Error df 

t- Value 

Pr> |f| 

Sex x Clo 


1 

1 


9.8396 

0.5352 

6.72 

18.38 

<0.0001 

Sex x Clo 


1 

2 


13.8639 

0.5352 

6.72 

25.90 

<0.0001 

Sex x Clo 


2 

1 


5.1256 

0.5352 

6.72 

9.58 

<0.0001 

Sex x Clo 


2 

2 


12.0294 

0.5352 

6.72 

22.47 

<0.0001 

Env x Time 

1 



1 

15.1066 

0.9026 

6.05 

16.74 

<0.0001 

Env x Time 

1 



2 

9.1065 

0.9026 

6.05 

10.09 

<0.0001 

Env x Time 

1 



3 

12.0050 

0.9026 

6.05 

13.30 

<0.0001 

Env x Time 

2 



1 

10.9319 

0.9026 

6.05 

12.11 

<0.0001 

Env x Time 

2 



2 

7.0568 

0.9026 

6.05 

7.82 

0.0002 

Env x Time 

2 



3 

9.0578 

0.9026 

6.05 

10.04 

<0.0001 

Env x Time 

3 



1 

9.5661 

0.9026 

6.05 

10.60 

<0.0001 

Env x Time 

3 



2 

9.5778 

0.9026 

6.05 

10.61 

<0.0001 

Env x Time 

3 



3 

9.5230 

0.9026 

6.05 

10.55 

<0.0001 


TABLE 26.35 

Pairwise Comparisons of the Sex x Clothing Means for the Comfort Study 


Effect 

Sex 

Clo 

Sex 

Clo 

Estimate 

Standard Error 

df 

f-Value 

Pr t 

Sex x Clo 

1 

1 

1 

2 

-4.0243 

0.2067 

18 

-19.47 

<0.0001 

Sex x Clo 

1 

1 

2 

1 

4.7140 

0.2067 

18 

22.81 

<0.0001 

Sex x Clo 

1 

1 

2 

2 

-2.1898 

0.2067 

18 

-10.59 

<0.0001 

Sex x Clo 

1 

2 

2 

1 

8.7383 

0.2067 

18 

42.28 

<0.0001 

Sex x Clo 

1 

2 

2 

2 

1.8345 

0.2067 

18 

8.88 

<0.0001 

Sex x Clo 

2 

1 

2 

2 

-6.9038 

0.2067 

18 

-33.40 

<0.0001 


TABLE 26.36 


Pairwise Comparisons of the Times within Each Environment for the Comfort Study 


Effect 

Env 

Time 

Env 

Time 

Estimate 

Standard Error 

df 

f-Value 

Pr t 

Env x Time 

1 

1 

1 

2 

6.0002 

0.09848 

48 

60.93 

<0.0001 

Env x Time 

1 

1 

1 

3 

3.1016 

0.09848 

48 

31.50 

<0.0001 

Env x Time 

1 

2 

1 

3 

-2.8986 

0.09848 

48 

-29.43 

<0.0001 

Env x Time 

2 

1 

2 

2 

3.8751 

0.09848 

48 

39.35 

<0.0001 

Env x Time 

2 

1 

2 

3 

1.8741 

0.09848 

48 

19.03 

<0.0001 

Env x Time 

2 

2 

2 

3 

-2.0010 

0.09848 

48 

-20.32 

<0.0001 

Env x Time 

3 

1 

3 

2 

-0.01172 

0.09848 

48 

-0.12 

0.9057 

Env x Time 

3 

1 

3 

3 

0.04309 

0.09848 

48 

0.44 

0.6637 

Env x Time 

3 

2 

3 

3 

0.05481 

0.09848 

48 

0.56 

0.5804 
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TABLE 26.37 


Pairwise Comparisons of Times within Each Environment for the Comfort Study 


Effect 

Env 

Time 

Env 

Time 

Estimate 

Standard Error 

df 

f-Value 

Pr t 

Env x Time 

1 

1 

2 

1 

4.1747 

1.2764 

6.05 

3.27 

0.0168 

Env x Time 

1 

1 

3 

1 

5.5405 

1.2764 

6.05 

4.34 

0.0048 

Env x Time 

1 

2 

2 

2 

2.0496 

1.2764 

6.05 

1.61 

0.1591 

Env x Time 

1 

2 

3 

2 

-0.4714 

1.2764 

6.05 

-0.37 

0.7245 

Env x Time 

1 

3 

2 

3 

2.9472 

1.2764 

6.05 

2.31 

0.0600 

Env x Time 

1 

3 

3 

3 

2.4820 

1.2764 

6.05 

1.94 

0.0994 

Env x Time 

2 

1 

3 

1 

1.3658 

1.2764 

6.05 

1.07 

0.3254 

Env x Time 

2 

2 

3 

2 

-2.5210 

1.2764 

6.05 

-1.98 

0.0953 

Env x Time 

2 

3 

3 

3 

-0.4652 

1.2764 

6.05 

-0.36 

0.7279 


TABLE 26.38 

SAS Commands to Analyze the Family Attitudes 
Experiment 

DATA AMD26_3; 

INPUT AREA $ FAM ATT ITUD1 - ATT ITUD 9 ; 

IF AREA=' RURAL' THEN FAM=FAM+10 ; 

CARDS; 


URBAN 

1 

17 

17 

19 

18 

19 

21 

16 

16 

18 

URBAN 

2 

12 

14 

15 

19 

19 

21 

16 

16 

18 

URBAN 

3 

8 

10 

11 

16 

18 

19 

11 

12 

12 

URBAN 

4 

5 

7 

7 

12 

12 

13 

13 

14 

14 

URBAN 

5 

2 

5 

6 

12 

14 

14 

14 

16 

18 

URBAN 

6 

9 

11 

11 

16 

17 

18 

14 

15 

16 

URBAN 

7 

8 

9 

9 

19 

20 

20 

15 

16 

18 

URBAN 

8 

13 

14 

16 

16 

17 

18 

18 

18 

20 

URBAN 

9 

11 

12 

13 

13 

16 

17 

7 

8 

10 

URBAN 

10 

19 

20 

20 

13 

15 

15 

11 

12 

12 

RURAL 

1 

12 

11 

14 

18 

19 

22 

16 

16 

19 

RURAL 

2 

13 

13 

17 

16 

15 

19 

19 

19 

23 

RURAL 

3 

12 

13 

16 

19 

18 

22 

17 

16 

20 

RURAL 

4 

18 

18 

21 

23 

23 

26 

23 

22 

26 

RURAL 

5 

15 

14 

16 

15 

15 

19 

17 

17 

20 

RURAL 

6 

6 

6 

10 

15 

16 

19 

18 

19 

21 

RURAL 

7 

16 

17 

18 

17 

17 

21 

18 

20 

23 


PROC PRI NT ; 

RUN; 

DATA USUAL; SET AMD26_3; DROP ATTITUD1-ATTITUD9; 


FMEMB=1; 

TIME= 0; 

ATTITUD=ATTITUD1; 

OUTPUT; 

FMEMB=1; 

TIME= 6; 

ATTITUD=ATTITUD2; 

OUTPUT; 

FMEMB=1; 

TIME=12; 

ATTITUD=ATTITUD3; 

OUTPUT; 

FMEMB=2 ; 

TIME= 0; 

ATTITUD=ATTITUD4; 

OUTPUT; 

FMEMB=2 ; 

TIME= 6; 

ATTITUD=ATTITUD5; 

OUTPUT; 

FMEMB=2 ; 

TIME=12; 

ATTITUD=ATTITUD6; 

OUTPUT; 

FMEMB=3 ; 

TIME= 0; 

ATTITUD=ATTITUD7; 

OUTPUT; 

FMEMB=3 ; 

TIME= 6; 

ATTITUD=ATTITUD8; 

OUTPUT; 

FMEMB=3 ; 

RUN; 

TIME=12; 

ATTITUD=ATTITUD 9; 

OUTPUT; 


Continued 
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TABLE 26.38 (continued) 

PROC MIXED DATA=USUAL; 

CLASSES AREA FMEMB TIME FAM; 

MODEL ATTITUD=AREA|FMEMB|TIME/DDFM=SATTERTH; 
RANDOM FAM (AREA) FMEMB*FAM(AREA); 

LSMEANS AREA*TIME/PDIFF; 

RUN; 

QUIT; 


TABLE 26.39 


Covariance Parameter Estimates for the Family 
Attitudes Experiment 


Covariance Parameter Estimates 

Covariance Parameter 

Estimate 

Fam(Area) 

3.2936 

Fmemb x Fam(Area) 

8.0442 

Residual 

0.3798 


TABLE 26.40 

Type III Tests of Fixed Effects for the Family Attitudes Experiment 


Type III Tests of Fixed Effects 

Effect Num df 

Den df 

F-Value 

Pr > F 

Area 

1 

15 

7.06 

0.0180 

Fmemb 

2 

30 

13.47 

<0.0001 

Area x Fmemb 

2 

30 

1.23 

0.3068 

Time 

2 

90 

268.94 

<0.0001 

Area x Time 

2 

90 

40.74 

<0.0001 

Fmemb x Time 

4 

90 

0.78 

0.5409 

Area x Fmemb x Time 

4 

90 

0.99 

0.4190 


TABLE 26.41 


Area x Time Two-Way Means for the Family Attitudes Experiment 
Least Squares Means 


Effect 

Area 

Time 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

Area x Time 

Rural 

0 

16.3333 

0.9336 

15.4 

17.49 

<0.0001 

Area x Time 

Rural 

6 

16.3810 

0.9336 

15.4 

17.55 

<0.0001 

Area x Time 

Rural 

12 

19.6190 

0.9336 

15.4 

21.01 

<0.0001 

Area x Time 

Urban 

0 

13.1000 

0.7811 

15.4 

16.77 

<0.0001 

Area x Time 

Urban 

6 

14.3000 

0.7811 

15.4 

18.31 

<0.0001 

Area x Time 

Urban 

12 

15.3000 

0.7811 

15.4 

19.59 

<0.0001 
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TABLE 26.42 


Pairwise Comparisons among the Area x Time Two-Way Means for the Family Attitudes 
Experiment 


Differences of Least Squares Means 

Effect Area Time Area 

Time 

Estimate 

Standard Error df 

f-Value 

Pr > 1 1 1 

Area x Time 

Rural 

0 

Rural 

6 

-0.04762 

0.1902 

90 

-0.25 

0.8029 

Area x Time 

Rural 

0 

Rural 

12 

-3.2857 

0.1902 

90 

-17.28 

<0.0001 

Area x Time 

Rural 

0 

Urban 

0 

3.2333 

1.2173 

15.4 

2.66 

0.0177 

Area x Time 

Rural 

0 

Urban 

6 

2.0333 

1.2173 

15.4 

1.67 

0.1150 

Area x Time 

Rural 

0 

Urban 

12 

1.0333 

1.2173 

15.4 

0.85 

0.4089 

Area x Time 

Rural 

6 

Rural 

12 

-3.2381 

0.1902 

90 

-17.03 

<0.0001 

Area x Time 

Rural 

6 

Urban 

0 

3.2810 

1.2173 

15.4 

2.70 

0.0163 

Area x Time 

Rural 

6 

Urban 

6 

2.0810 

1.2173 

15.4 

1.71 

0.1074 

Area x Time 

Rural 

6 

Urban 

12 

1.0810 

1.2173 

15.4 

0.89 

0.3882 

Area x Time 

Rural 

12 

Urban 

0 

6.5190 

1.2173 

15.4 

5.36 

<0.0001 

Area x Time 

Rural 

12 

Urban 

6 

5.3190 

1.2173 

15.4 

4.37 

0.0005 

Area x Time 

Rural 

12 

Urban 

12 

4.3190 

1.2173 

15.4 

3.55 

0.0028 

Area x Time 

Urban 

0 

Urban 

6 

-1.2000 

0.1591 

90 

-7.54 

<0.0001 

Area x Time 

Urban 

0 

Urban 

12 

-2.2000 

0.1591 

90 

-13.83 

<0.0001 

Area x Time 

Urban 

6 

Urban 

12 

- 1.0000 

0.1591 

90 

-6.28 

<0.0001 


gives the Area x Time means. Table 26.42 provides pairwise comparisons between various 
subsets of these means. 


26.4 Concluding Remarks 

The analysis of repeated measures designs was described for three examples involving 
repeated measures where split-plot in time assumptions hold. Included in the discussion 
were the models and assumptions for each example. Computational formulas for obtain¬ 
ing standard errors for multiple comparisons were given as well as methods for investigat¬ 
ing various contrasts among the means. Analyses using the SAS-Mixed procedure were 
obtained in Section 26.3 and can be compared with the results given in Section 26.2. 


26.5 Exercises 

26.1 Phlebitis is an inflammation of a blood vein that can occur when intravenously 
administering drugs. The active drug was thought to be the main contributing 
factor to inflammation, although the solution used as a vehicle to carry the drug 
throughout the blood stream could be a possible contributor. Investigators 
wanted to be able to detect, if possible, the onset of phlebitis, and a study was 
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TABLE 26.43 


Differences in Temperature between the Treated 
Ear and the Untreated Ear 


Rabbit 

Treatment 


Time 


0 

30 

60 

90 

1 

Amiodarone 

-0.3 

-0.2 

1.2 

3.1 

2 

Amiodarone 

-0.5 

2.2 

3.3 

3.7 

3 

Amiodarone 

-1.1 

2.4 

2.2 

2.7 

4 

Amiodarone 

1.0 

1.7 

2.1 

2.5 

5 

Amiodarone 

-0.3 

0.8 

0.6 

0.9 

6 

Vehicle 

-1.1 

-2.2 

0.2 

0.3 

7 

Vehicle 

-1.4 

-0.2 

-0.5 

-0.1 

8 

Vehicle 

-0.1 

-0.1 

-0.5 

-0.3 

9 

Vehicle 

-0.2 

0.1 

-0.2 

0.4 

10 

Vehicle 

-0.1 

-0.2 

0.7 

-0.3 

11 

Saline 

-1.8 

0.2 

0.1 

0.6 

12 

Saline 

-0.5 

0.0 

1.0 

0.5 

13 

Saline 

-1.0 

-0.3 

-2.1 

0.6 

14 

Saline 

0.4 

0.4 

-0.7 

-0.3 

15 

Saline 

-0.5 

0.9 

-0.4 

-0.3 


designed to explore mechanisms for early detection of phlebitis during amio- 
darone therapy. They believed that a change in tissue temperature near the intra¬ 
venous administration is an early signal of impending inflammation. To test their 
belief and to see if amiodarone had an affect, one of three intravenous treatments 
was administered to one of the ears of a rabbit. The surface area temperature of the 
treated ear was measured at various time points and compared with the surface 
area temperature of the rabbit's other (untreated) ear. The treatments were amio¬ 
darone with a vehicle solution to carry the drug, the vehicle solution only, and a 
saline solution. Five rabbits were randomly assigned to each of the three treat¬ 
ments. The difference in the temperature between the treated ear and the untreated 
ear is used as the response. The data are given in Table 26.43. Give an analysis of 
these data assuming that the repeated measures satisfy the split-plot-in-time 
assumption. Please be sure to address the following questions: 

1) Is there a significant TIME x TRT interaction? Why or why not? 

2) Are there differences between the three treatments? Are there time differ¬ 
ences? Explain your answers. 

3) Which treatment seems to have the greatest affect on ear temperature 
difference? 

4) The investigators chose to use the difference in temperature between the 
treated ear and the untreated ear as the variable to analyze. Some statisti¬ 
cians might have suggested that they use the untreated ear temperature as a 
covariate and then perform an analysis of covariance. You might wish to 
discuss such an alternative during one of your class sessions. 

26.2 A study was conducted to evaluate the effect of six diets on cholesterol levels of 
adult males. The diets are constructed from the six combinations of two Fiber 
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levels (Low, High) and three Fat levels (Low, Med, and High). Thirty-six persons 
were randomly assigned to the six diets, six persons per diet. Each person 
was on her assigned diet for eight months. Each person's cholesterol level was 
determined every two months (Choll = 2 months, Chol2 = 4 months, Chol3 = 6 
months, and Chol4 = 8 months). Provide a complete analysis of this data assum¬ 
ing that the split-plot-in-time assumptions hold in order to evaluate the effect of 
diet and time on cholesterol level. The data are given in Table 26.44. 


TABLE 26.44 

Cholesterol Levels over Time 


Subject 

Fat 

Fiber 

Choll 

Chol2 

Chol3 

Chol4 

1 

Low 

Low 

175 


134 

138 

2 

Low 

Low 

192 

169 

142 

137 

3 

Low 

Low 

153 

132 

115 

114 

4 

Low 

Low 

204 

184 

162 

164 

5 

Low 

Low 

194 

173 

149 

151 

6 

Low 

Low 

224 

194 

164 

170 

7 

Low 

High 

163 

145 

132 

129 

8 

Low 

High 

204 

183 

163 

166 

9 

Low 

High 

170 

148 

129 

128 

10 

Low 

High 

182 

165 



11 

Low 

High 

186 

160 

140 

139 

12 

Low 

High 

181 

169 

150 

148 

13 

Med 

Low 

223 


191 

191 

14 

Med 

Low 

217 

213 

204 


15 

Med 

Low 

236 

219 

205 

206 

16 

Med 

Low 

201 

182 

165 

168 

17 

Med 

Low 

220 

205 

198 

205 

18 

Med 

Low 

191 

189 

183 

188 

19 

Med 

High 

183 

183 

168 

174 

20 

Med 

High 

218 


192 

183 

21 

Med 

High 

200 

189 

182 

181 

22 

Med 

High 

210 

193 

182 


23 

Med 

High 

208 

202 

192 

192 

24 

Med 

High 

211 

193 

185 

185 

25 

High 

Low 

237 

243 

244 

246 

26 

High 

Low 

242 

237 

231 

227 

27 

High 

Low 

224 

217 

219 

229 

28 

High 

Low 

217 

225 

237 

238 

29 

High 

Low 

225 

226 

227 

228 

30 

High 

Low 

229 

223 

226 

226 

31 

High 

High 

193 

202 

206 

197 

32 

High 

High 

189 

193 

189 

185 

33 

High 

High 

231 

222 

217 

209 

34 

High 

High 

237 

233 

226 

224 

35 

High 

High 

210 

208 

208 

206 

36 

High 

High 

214 

205 

206 

199 
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TABLE 26.45 


Cheese Quality Measures 


Refrigerator 

Position 

Moisture 

Y 

1 

Top 

Low 

97 

1 

Top 

High 

99 

1 

Middle 

Low 

101 

1 

Middle 

High 

105 

1 

Bottom 

Low 

105 

1 

Bottom 

High 

110 

2 

Top 

Low 

107 

2 

Top 

High 

107 

2 

Mid 

Low 

108 

2 

Mid 

High 

111 

2 

Bottom 

Low 

120 

2 

Bottom 

High 

125 

3 

Top 

Low 

103 

3 

Top 

High 

92 

3 

Mid 

Low 

103 

3 

Mid 

High 

93 

3 

Bottom 

Low 

107 

3 

Bottom 

High 

106 

4 

Top 

Low 

91 

4 

Top 

High 

99 

4 

Mid 

Low 

93 

4 

Mid 

High 

106 

4 

Bottom 

Low 

93 

4 

Bottom 

High 


5 

Top 

Low 

97 

5 

Top 

High 

103 

5 

Mid 

Low 

93 

5 

Mid 

High 


5 

Bottom 

Low 

95 

5 

Bottom 

High 

111 

6 

Top 

Low 

96 

6 

Top 

High 

94 

6 

Mid 

Low 

101 

6 

Mid 

High 

106 

6 

Bottom 

Low 

107 

6 

Bottom 

High 

115 

7 

Top 

Low 

101 

7 

Top 

High 

100 

7 

Mid 

Low 

97 

7 

Mid 

High 

98 

7 

Bottom 

Low 

100 

7 

Bottom 

High 

105 

8 

Top 

Low 

93 

8 

Top 

High 


8 

Mid 

Low 

95 

8 

Mid 

High 


8 

Bottom 

Low 

96 

8 

Bottom 

High 
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26.3 An experiment involves the curing of cheese in refrigerators where each refrig¬ 
erator is divided into left and right compartments and into top, middle, and 
bottom sections. The right and left sides are randomly assigned a level of mois¬ 
ture (low or high). There was a belief that the position in the refrigerator might 
have slightly different temperatures as heat rises, so the refrigerator was divided 
into thirds (top, middle, bottom), and one package of cheese was placed in 
each section of the refrigerator; that is, there are six packages of cheese in each 
refrigerator with one package in each section. This process was repeated for 
eight different refrigerators. The data are given in Table 26.45. 

1) Identify each size of experimental unit. 

2) Describe the treatment and design structures for each size of experimental 
unit. 

3) Write out a model that can be used to describe the data structure. 

4) Carry out analysis of this data set and make any necessary comparisons. 





Analysis of Repeated Measures Experiments 
When the Ideal Conditions Are Not Satisfied 


Repeated measures designs involve one or more steps where a researcher cannot randomly 
assign the levels of one or more of the factors of to an experimental unit. The use of time 
as a factor is the most common experimental situation where one cannot use randomiza¬ 
tion. For example, when data are collected on the same experimental unit at several time 
points, one cannot randomize the order of the time points. Time 1 must be first, time 2 
must be second, and so on. This nonrandom assignment of the repeated measures factor 
influences the variances and covariances between the experimental units, and the ideal 
conditions described in Chapter 26 may not be valid. This chapter presents strategies for 
analyzing data from repeated measures experiments when the ideal conditions given in 
Chapter 26 do not hold. In addition, procedures are described that allow one to check to 
see whether the ideal conditions described in Chapter 26 are satisfied. 


27.1 Introduction 

Consider an experimental situation similar to that described in Table 26.1. Let xj ljk represent 
the observed response for subject k in treatment group i at time j and let 


Vm 


y, k = 


y in. 


y ipk 


be the vector of responses for subject k in treatment group i. 

A model that can be used to describe these data is 

y ijk = H + a,■+ tj + % + £* jk , z = l,2,...,f; j =1,2,... ,p- k=l,2,...,n l 
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Let 



be the vector of errors for subject k in treatment group i. Suppose that e* k are distributed inde¬ 
pendently and identically as p —variate multivariate normal distributions with mean, 0, 
and covariance matrix, E. That is, £* k ~ i.i.d. N( 0, 1), i = 1,2,..., f; k - 1,2,..., n r 
Let 


<hl 

0 J2 

■■ 

<7 2 l 

0 22 

" ^ 


<v • 

" <v 


represent the covariance matrix of the vector of repeated measures. 

Definition 27.1: If E= A/, + ry' +jrj, where j is a p x 1 vector of ones, and p is a p x 1 
vector of constants, the repeated measures are said to satisfy conditions known as the 
Huynh-Feldt (H-F) conditions (see Huynh and Feldt, 1970). 

When Definition 27.1 holds, Xhas the form 


A + 2ri l 

Vi +r h 

Pl+Pp 

Pi+h 

A + 2p 2 

Pl+Pp 

flp+P i 

V P +12 

A + 2p p 


A special case of the H-F conditions is when the repeated measures possess a compound 
symmetry covariance structure, then 

M p - p] 


IP P ■■■ 

for some d 1 and p. Note that the ideal conditions described for the split-plot-in-time ana¬ 
lysis discussed in Section 26.1 are a special case of the covariance matrix of the repeated 
measures possessing compound symmetry with 0 2 = o\ + a\ and p = cr|/(cr| + of). 
Compound symmetry structure is more general than the split-plot-in-time structure 
since p can be negative in the compound symmetry structure. 
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Remark: If the H-F conditions are satisfied, then many of the important questions 
involving time comparisons can be answered by analyzing the repeated measures 
experiment in the same way one analyzes a repeated measures experiment satisfying the 
ideal conditions given in Section 26d, and if the repeated measures satisfy compound 
symmetry, then the split-plot-in-time methods of analysis given in Chapter 26 can be used. 
In particular, the split-plot-in-time tests for the Time main effect and the Time x Trt inter¬ 
action effect can be shown to be statistically valid if and only if the repeated measures 
satisfy the H-F conditions given in Definition 27.1. Furthermore, contrasts that compare 
Time main effects and contrasts that compare Time effects within a specified value of the 
Trt variable are statistically valid when using the split-plot-in-time analysis. If one is inter¬ 
ested in the two-way means or marginal means, the split-plot-in-time analysis gives the 
correct estimates of the means, but the standard errors will be incorrect. 

Remark: If the repeated measures possess compound symmetry with p > 0, then all of 
the results given by a split-plot-in-time analysis will be correct. In this case, one can 
say that £* k = S ik + e ijk where S lk ~ i.i.d. N( 0, of), e ijk ~ i.i.d. N( 0, of), and the 8 lk and the £ i]k are 
independent. 

Question: What if the H-F conditions are not satisfied? 

In the case where the H-F conditions are not satisfied, several methods of analysis can 
be considered. One approach that is always appropriate is to treat the vector of repeated 
measures as a multivariate response vector and use multivariate analysis of variance 
(MANOVA) methods. A second is to use the split-plot-in-time analysis, but adjust the 
p-values by adjusting the degrees of freedom corresponding to relevant effect mean 
squares. The third approach is to use a mixed model approach available in the SAS®-Mixed 
and the SAS®-Glimix procedures and model the covariance structure. 

MANOVA methods are described in Section 27.2, adjusted degrees of freedom meth¬ 
ods are discussed in Section 27.3, and Mixed model methods will be discussed in 
Section 27.4. 


27.2 MANOVA Methods 

This section considers using multivariate analysis of variance methods to analyze repeated 
measures experiments. These methods are always appropriate when 

£% ~ i-i.d. N( 0, Z), i = 1,2,..., f; k = 1,2,..., n, 

The MANOVA methods can only be applied to experiments where either all repeated 
measures on a given experimental unit are present or all are missing since any experi¬ 
mental unit that has a missing value for one or more of the repeated measures will be 
automatically deleted from the analysis by the statistical software being used. This is not a 
problem with the other two approaches. The number of experimental units assigned to 
each treatment need not be balanced. To get the most out of this section, one should be 
able to understand the matrix form of the model discussed in Chapter 6. 
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A general multivariate model for a repeated measures experiment that has a structure 
similar to that given in Table 26.1 is a generalization of the matrix form of the model defined 
in Equation 26.1. The multivariate model can be expressed as 

Y=XB+E (27.1) 

where Y denotes all of the data measured in the experiment. Each row of the data matrix 
Y corresponds to a particular experimental unit, and each column corresponds to one of 
the repeated measures. Thus Y is an N x p matrix where N = X! i n r The matrix X is an 
Nxr design matrix assumed to be of rank t. Each column of B is an r x 1 vector of 
unknown parameters with each column corresponding to a particular repeated mea¬ 
sure. The matrix E is an N x p matrix of unobservable random errors. It is assumed that 
the rows of E are independently distributed N( 0, Z). Thus, while the rows in E are 
independent, the elements in a row may be correlated with one another and may have 
different variances. 

For the multivariate model (27.1), one can test general hypotheses of the form 

H 0 : CBM = 0 vs H a : CBM * 0 (27.2) 

where C is a g x r matrix of rank g and M is a p x q matrix of rank q. 

To test the hypothesis in Equation 27.2, one first needs the least squares estimates of the 
parameters in B and an observed residual sum-of-squares and cross-products matrix. 
These are denoted by B and E, respectively, and given by 

B = (X'X) X'Y and E = V'[l - X(X'X) X']Y (27.3) 


A likelihood ratio test statistic for testing the hypothesis in Equation 27.2 is 
given by 


A = 


\R\ 

H + R\ 


(27.4) 


where 


R = M'EM, H = M'BC' [C(X'X) C'] 1 CBM 


and |W| denotes the determinant of the matrix W. 

The statistic A is called Wilks' likelihood ratio criterion (Morrison, 1976). The sampling 
distribution of A is quite complicated, but for most practical purposes, an approximate 
a-level test can be obtained by rejecting H 0 when 


- N-t- 


\q-g 1 + 1 


2 


log e (A) > xl, m 
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A better approximation, to be used only when both q and g are greater than 2, is to reject 
H 0 when 

F s. F 

x ^ ± a,qg,ab—c 

where 


F = 


(1 - A 1/b )(ab - c) 

qg^ 


and where 


a = N -t - 


\ C 1 ~ s| + 1 
2 


b = 


' q 2 s 2 - 4 ' 

y + s' - 5, 


1/2 


(27.5) 


as-2 
c = — 

and 

s = min (q, g) 

Exact F tests for Equation 27.2 exist whenever q = 1,2 or whenever g -1,2. These tests are 
as follows: 

1) For g= 1 and any q, reject H 0 if 


F = 



IT 

i 

<*+* 

l 

-si 

+ 
i—* 

l A J 

l <\ ) 


> F 


a,q,N-t-q+l 


(27.6) 


2) For q = 1 and any g, reject H 0 if 


F = 


f 1-A l 

(N-t' 

l A J 

{ s J 


> F 


a,g,N-t 


(27.7) 


3) For g = 2 and any ty > 1, reject H 0 if 


fi — Va^| 

''N-t-q + l' 

< Va , 

{ <\ ) 


> F 


a ,2ij ,2(N-t-q+l) 


(27.8) 
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4) For q = 2 and any g > 1, reject H 0 if 


fl-VTl 

'N-t- 1' 

V ^ 2 

l g J 


> F 


a,2g,2(N-f-l) 


(27.9) 


One drawback to the multivariate method is that one must have p <N -t. When p>N-t, 
it is often possible to combine adjacent repeated measures into p* new variables or to 
analyze only a size p* subset of the repeated measures where p*<N-t. 

To illustrate the analysis described in this section, consider an experiment conducted to 
study the differences among four varieties of sorghum and five fertilizer levels on a leaf 
area index where the four varieties of sorghum are denoted by V v V 2/ V y and V 4 and the 
five fertilizer levels are denoted by 1, 2, 3, 4, and 5. Also suppose that these 20 variety x 
fertilizer combinations were randomly assigned to 20 plots in a field. For this example, it 
is assumed that there is no interaction between fertilizer levels and varieties and that a 
basic two-way additive model can be used to analyze the data. Finally, assume that leaf 
area index measurements are made on each variety x fertilizer plot at weekly intervals 
for five weeks beginning two weeks after emergence of the plant. The data obtained are 
given in Table 27.1. 


TABLE 27.1 


Leaf Area Index on Four Sorghum Varieties 


Time 

Variety 

Fertilizer 

Week 1 

Week 2 

Week 3 

Week 4 

Week 5 

Vi 

1 

5.00 

4.84 

4.02 

3.75 

3.13 


2 

4.42 

4.30 

3.67 

3.23 

2.83 


3 

4.42 

4.10 

3.46 

3.09 

2.82 


4 

4.01 

3.89 

3.21 

2.89 

2.56 


5 

3.36 

3.10 

2.67 

2.47 

2.16 

v 1 

1 

5.82 

5.60 

5.05 

4.72 

4.46 


2 

5.73 

5.59 

5.00 

4.65 

4.42 


3 

5.31 

5.19 

4.86 

4.44 

4.22 


4 

4.92 

4.66 

4.56 

4.16 

3.99 


5 

3.96 

3.86 

3.50 

3.13 

2.95 

v* 

1 

5.65 

5.97 

5.27 

5.07 

4.52 


2 

5.39 

5.49 

5.08 

4.87 

4.32 


3 

5.15 

5.28 

4.93 

4.67 

4.15 


4 

4.50 

4.89 

4.74 

4.49 

4.10 


5 

3.75 

3.74 

3.55 

3.28 

3.00 

V 4 

1 

5.86 

5.60 

5.37 

5.00 

4.37 


2 

5.82 

5.55 

5.29 

4.95 

4.07 


3 

5.26 

5.06 

4.76 

4.48 

3.94 


4 

4.87 

4.75 

4.55 

4.33 

3.83 


5 

3.96 

3.76 

3.56 

3.18 

2.96 
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For the data in Table 27.1, the data matrix is 


y = 


5.00 

4.84 

4.02 

3.75 

3.13 

4.42 

4.30 

3.67 

3.23 

2.83 

4.42 

4.10 

3.46 

3.09 

2.82 

4.01 

3.89 

3.21 

2.89 

2.56 

3.36 

3.10 

2.67 

2.47 

2.16 

5.82 

5.60 

5.05 

4.72 

4.46 

5.73 

5.59 

5.00 

4.65 

4.42 

5.31 

5.19 

4.86 

4.44 

4.22 

4.92 

4.66 

4.56 

4.16 

3.99 

3.96 

3.86 

3.50 

3.13 

2.95 

5.65 

5.97 

5.27 

5.07 

4.52 

5.39 

5.49 

5.08 

4.87 

4.32 

5.15 

5.28 

4.93 

4.67 

4.15 

4.50 

4.89 

4.74 

4.49 

4.10 

3.75 

3.74 

3.55 

3.28 

3.00 

5.86 

5.60 

5.37 

5.00 

4.37 

5.82 

5.55 

5.29 

4.95 

4.07 

5.26 

5.06 

4.76 

4.48 

3.94 

4.87 

4.75 

4.55 

4.33 

3.83 

3.96 

3.76 

3.56 

3.18 

2.96 


The matrix of parameters is given by 


i (1) 

P (2) 

m (3) 

p (i) 

p (5) 

•d) 

1 

r (2) 

L 1 

t (3) 

L 1 

t (4) 

T (S) 

l \ 

.(1) 

2 

t (2) 

l 2 

t (3) 

1 2 

t (4) 

l 2 

t (5) 
L 2 

.(1) 

3 

t (2) 

L 3 

r (3) 

l 3 

t (4) 

L 3 

r (5) 

l 3 

.(1) 

4 

< 2) 


< 4) 

Tf 


A (2) 

K 3) 

/r 

R 5) 


P (2) 

Pi 3) 

# 4) 

Pi 5) 


$ 2) 

P (3) 

$ 4) 

$ 5) 

»4 W 

P (2) 

P (3) 


Pi 5) 


& 2) 

/?> 

# 4) 

# 5) 



542 


Analysis of Messy Data Volume 1: Designed Experiments 


where the x correspond to the different varieties and the j3 correspond to the different 
levels of fertilizer. The design matrix is 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


1 0 0 
1 0 0 
1 0 0 
1 0 0 
1 0 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 0 1 
0 0 1 
0 0 1 
0 0 1 
0 0 1 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 


0 1 0 
0 0 1 
0 0 0 
0 0 0 
0 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
0 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
0 0 0 
1 1 0 
1 0 1 
1 0 0 
1 0 0 
1 0 0 


0 0 0 
0 0 0 
1 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
1 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
1 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
1 0 0 
0 1 0 
0 0 1 


Note that each column of B represents the parameters needed for a two-way additive 
model for the /1h response column of the data matrix, Y, j = 1, 2, 3, 4, 5. The design matrix 
X is a 20 x 10 matrix and has rank equal to 8; thus N = 20,p = 10, and t = 8. The value of B 
given by Equation 27.3 is 


3.350 

3.283 

3.003 

2.788 

2.150 

0.222 

0.106 

-0.198 

-0.260 

-0.312 

1.128 

1.040 

0.990 

0.874 

0.996 

0.868 

1.134 

1.110 

1.130 

1.006 

1.134 

1.004 

1.102 

1.042 

0.822 

1.395 

1.398 

1.173 

1.115 

0.982 

1.152 

1.128 

1.006 

0.940 

0.772 

0.847 

0.803 

0.748 

0.685 

0.645 

0.387 

0.443 

0.511 

0.483 

0.482 

-0.430 

-0.489 

-0.434 

-0.470 

-0.370 
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and the value of E in Equation 27.3 is 


0.237 

0.171 

0.162 

0.228 

0.129 

0.171 

0.247 

0.163 

0.231 

0.135 

0.162 

0.163 

0.268 

0.303 

0.184 

0.228 

0.231 

0.303 

0.392 

0.241 

0.129 

0.135 

0.184 

0.241 

0.247 


The test for equal variety main effect means is obtained from Equation 27.4 by taking 



1 -1 0 0 0 0 
10-1000 
10 0-100 


0 0 0 
0 0 0 
0 0 0 


and 


M = 


1 

1 

1 

1 

1 


Then A = 0.04345 with g = 3 and q — 1. Since q = 1, one can use Equation 27.7, and get 


1 - 0.04345 w 12 
0.04345 X 3 


88.06 


with 3 and 12 degrees of freedom. The observed significance level a is less than 0.0001. 
The test for equal time main effect means is obtained from Equation 27.4 by taking 


C = 


1 


111111111 

444455555 


and 


M = 


1 

-1 

0 

0 

0 


1 

0 

-1 

0 

0 


1 1 

0 0 

0 0 

-1 0 

0 -1 


One gets A = 0.04345 with g = 1 and q = 4. Since g = 1, one can use Equation 27.6, and get 


r _ 1-0.00502 
0.00502 


x | = 445.96 


with 4 and 9 degrees of freedom. The observed significance level a is less than 0.0001. 
The test for Variety x Time interaction is obtained from Equation 27.6 by taking 



1 -1 0 

1 0 -1 

1 0 0 


0 0 0 

0 0 0 

-10 0 


0 0 0 
0 0 0 
0 0 0 


and 


M = 


1 

-1 

0 

0 

0 


1 

0 

-1 

0 

0 


1 

0 

0 

-1 

0 


1 

0 

0 

0 

-1 
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One gets A = 0.01426 with g = 3 and q = 4. Since both g and q are greater than 1, one can 
use Equation 27.5 with s = 3, fl = 11, b = 2.646, and c = 5. Then 

(1 - 0.01426 1/2,646 )(11 x 2.646 - 5) _ (1 - 0.2006)(24.1060) _ 0 0Q 
~~ 4 x 3 x 0.01426 172 ' 646 ~ 12(0.2006) ~ 

with 12 and 24.1 degrees of freedom. The observed significance level ais less than 0.0001. 

Note that the test for equal variety means is the same as that obtained by performing the 
split-plot-in-time analysis. It is the only test of the three that is the same as the corres¬ 
ponding tests obtained by a split-plot-in-time analysis. Test statistics for the fertilizer main 
effect (F = 94.36) and the fertilizer x time interaction effect (F = 1.91) can be obtained in a 
similar manner. 

Many special functions of the parameters in B are estimable, and inferences can be made 
on those estimable functions. Most of the interesting linear functions of the parameters in 
B can be written in the form c'Bm where c is an r x 1 vector and m is a p x 1 vector. As in 
Chapter 6, c'Bm is estimable if and only if there exists a vector u such that X'Xn = c. There 
are no restrictions on m. 

The best estimate of c'Bm is c'Bm where B is given in Equation 27.3. The estimated stan¬ 
dard error of c'Bm is given by 


s.e.(c'Bm) 


c'(X'X)~c- 


m'Em 
N-t 


and its corresponding degrees of freedom are N-t. Thus a (1 - a)100% confidence interval 
for c'Bm is given by 


c'Bm ± t a N _ t s.e.(c'Bm) 


(27.10) 


and a f-statistic for testing H 0 : c'Bm = a 0 is given by 


c'Bm - a 0 

s.e.(c'Bm) (27.11) 


If t > t a/ 2N _ f , then H q is rejected. 

As an example, consider estimating the V 1 marginal mean. For this marginal mean. 


c' = [1 1 0 0 0 0.2 0.2 0.2 0.2 0.2] and m 


0.2 

0.2 

0.2 

0.2 

0.2 


The value of c'Bm is 3.496, and its estimated standard error is 0.0594. A 95% confidence 
interval for the V 1 marginal mean is found from Equation 27.10 as 3.496 ± (2.179) (0.0594). 
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As a second example, consider estimating the Time 1 marginal mean for the data in 
Table 27.2. For this marginal mean. 


c' = [1 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.2 0.2] and 


m = 


1 

0 

0 

0 

0 


The value of c'Bm is 4.858, and its estimated standard error is 0.0315. 

As a third example, consider estimating the difference between the V 1 and V 2 marginal 
means. Here 


c'=[0 1 -1 0 0 0 0 0 0 0 ] and m 


0.2 

0.2 

0.2 

0.2 

0.2 


The value of c'Bm is -1.094, and its estimated standard error is 0.0840. A f-statistic for com¬ 
paring these two marginal means is t = -1.094/0.0840 = 13.02 and its observed significance 
level is a < 0.0001. 

Many of the inference results given above can be obtained from the SAS®-GLM proce¬ 
dure when one uses a MANOVA option along with its M = option. To illustrate, the data in 
Table 27.1 are reanalyzed using the SAS commands given in Table 27.2. The first MANOVA 
option is used to obtain E, and the second MANOVA option is used to obtain tests that 
compare Variety main effect means and Fertilizer main effect means [note that M = (0.2 0.2 
0.2 0.2 0.2) tells the MANOVA option to average across the five repeated time measures]. 
Tests for Fertilizer x Time interaction and Variety x Time interaction are obtained from the 
third MANOVA option where M is a 4 x 5 matrix of time contrasts. 


TABLE 27.2 

SAS-GLM Code to Analyze the Data in Table 27.1 Using MANOVA 
PR0C GLM DATA=LAI; 

CLASSES FERTILIZER VARIETY; 

MODEL LAI1-LAI5=FERTILIZER VARIETY/NOUNI; 

MANOVA /PRINTE ; 

MANOVA H=FERTILIZER VARIETY 

M—(.2 .2 .2 .2 . 2 ) ; 

MANOVA H=FERTILIZER VARIETY M=(l -1 0 0 0 , 

10 - 100 , 

10 0 - 10 , 

1 0 0 0 -1 ) ; 

CONTRAST 'V1-V2' VARIETY 1 -1 0 0; 

RUN; 
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Table 27.3 gives the value E obtained from the MANOVA analysis in Table 27.2. Table 27.4 
gives the MANOVA tests for Fertilizer and Variety main effects, and Table 27.5 gives the 
MANOVA interaction tests. 

MANOVA and CONTRAST options can both be used to test hypotheses of the form 
H 0 : c'Bm = 0. For example, by adding the following three statements to the SAS commands 
in Table 27.2 

CONTRAST 'V1-V2' VARIETY 1 -1 0 0; 

MANOVA M=(.2 .2 .2 .2 .2) / PRINTE; 

MANOVA M=(l 0 0 0 0) / PRINTE; 

one can obtain a Wilks statistic that compares the Variety 1 and 2 main effect means to one 
another from the first MANOVA statement, and a Wilks statistic that compares Variety 1 to 
Variety 2 at Time 1 from the second MANOVA statement. By including the PRINTE option 
in each of the above MANOVA commands, one also gets values of m'Em. Table 27.6 gives 
a test statistic (F = 169.4) that compares the Variety 1 main effect mean to the Variety 2 main 
effect mean. Note that 169.4 is the square of the f-statistic given earlier in this section. Also 
note that m'Em = 0.2115 when m' = [0.2 0.2 0.2 0.2 0.2]. 


TABLE 27.3 


Error Sums of Squares and Cross-Products Matrix 


E = Error SSCP Matrix 

LAI1 

LAD 

LAI3 

LAM 

LAI5 

LAI1 

0.23699 

0.171285 

0.162175 

0.22799 

0.129105 

LAD 

0.171285 

0.24678 

0.16292 

0.231465 

0.135435 

LAI3 

0.162175 

0.16292 

0.26763 

0.30261 

0.184435 

LAM 

0.22799 

0.231465 

0.30261 

0.39232 

0.24145 

LAI5 

0.129105 

0.135435 

0.184435 

0.24145 

0.24683 


TABLE 27.4 


Tests for Fertilizer and Variety Main Effects 


M Matrix Describing Transformed Variables 

LAI1 

LAD 

LAI3 

LAM 

LAI5 

MVAR1 

0.2 

0.2 

0.2 

0.2 

0.2 


MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Fertilizer Effect 
on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for Fertilizer 
E = Error SSCP Matrix S = 1 M = 1 N = 5 

Statistic Value F -Value Num df Den df Pr > F 

Wilks's lambda 0.03081325 94.36 4 12 <0.0001 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Variety Effect 
on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for Variety 
E = Error SSCP Matrix S = 1 M=0.5 N = 5 

Statistic Value F -Value Num df Den df Pr > F 

0.04345320 88.05 3 12 <0.0001 


Wilks's lambda 
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TABLE 27.5 

MANOVA Interaction Tests 
M Matrix Describing Transformed Variables 



LAI1 

LAB 

LAB 

LAM 

LAB 

MVARl 

1 

-1 

0 

0 

0 

MVAR2 

1 

0 

-1 

0 

0 

MVAR3 

1 

0 

0 

-1 

0 

MVAR4 

1 

0 

0 

0 

-1 

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Fertilizer Effect 
on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for Fertilizer 

E = Error SSCP Matrix S = 4 M = -0.5 N = 3.5 


Statistic 

Value 

F-Value 

Num df 

Den df 

Pr>F 

Wilks's lambda 

0.10535731 

1.91 

16 

28.133 

0.0640 

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Variety Effect 
on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for Variety 

E = Error SSCP Matrix S = 3 M = 0 N = 3.5 


Statistic 

Value 

F-Value 

Num df 

Den df 

Pr>F 

Wilks's lambda 

0.01425985 

8.00 

12 

24.103 

<0.0001 


TABLE 27.6 

Test Comparing the Variety 1 and Variety 2 Main Effect Means 
M Matrix Describing Transformed Variables 

LAI1 LAI2 LAB LAM LAB 

MVAR1 0.2 0.2 0.2 0.2 0.2 

E = Error SSCP Matrix 

MVAR1 

MVAR1 0.2115316 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall V 1 — V 2 Effect 
on the Variables Defined by the M Matrix Transformation H = Contrast SSCP Matrix for 
V 1 -V 2 E = Error SSCP Matrix S = 1 M = -0.5 N = 5 

Statistic Value F- Value Num df Den df Pr > F 

Wilks's lambda 0.06602890 169.74 1 12 <0.0001 


27.3 p-Value Adjustment Methods 

Recall that the analysis of a repeated measures experiment when the repeated measures 
satisfy the compound symmetry assumptions can be obtained by the split-plot-in-time 
methods described in Chapter 26. If one is only considering tests that involve differences 
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in the time factor, then the split-plot-in-time tests are also valid when the repeated measures 
satisfy the H-F conditions. A second approach to analyzing repeated measures experi¬ 
ments that have been shown to be more powerful than analyses based on the MANOVA 
approach described in Section 27.2. This approach is to perform the split-plot-in-time 
analysis with an adjustment to the numerator and denominator degrees of freedom of the 
test statistics should the H-F conditions not be satisfied, resulting in adjusted p-values. 

Suppose 



<hi 

°12 

■■ % 

E = 

<j 2 i 

C 22 

- 


_<v 

ff p2 ■ 

•• <v 


is the covariance matrix of the repeated measures. Let 

2 , _ _ ,2 

a _ V (< 7 „ - (?..) 


(P-1) 

Z5X- 2 p X < + p 2 °: 


i=i 7=1 1=1 


where 


_ i p _ j p _ i p p 

°u- ° - = X 0 '"' and - ~ 

V 7=1 P 7=1 P «=1 7=1 


(27.12) 


Box (1954) proposed 9 as a measure of how far E deviates from compound symmetry and 
he showed that l/(p — 1) < 0< 1. The smaller the value of 9, the further 2iis from compound 
symmetry. Suppose that F TIME and F TIMExTRT are the split-plot-in-time test statistics for Time 
main effect and Time x Trt interaction, respectively, for a repeated measures scenario such 
as that described by Table 26.1. Box showed that, when E deviates from compound sym¬ 
metry by 9, then 

F TIM e is approximately distributed as F[9(p - 1), 9(N - f)(p - 1)] (27.13) 

when there is no time effect, and 

Ftimextrt is approximately distributed as F[9(t - 1 )(p - 1), 9(N - f)(p - 1)] (27.14) 

when there is no interaction effect. 

Unfortunately, 9 is not known since the are not known. If the o,, can be estimated, 
then 9 can be estimated. 

Three estimates of 9 have been proposed. The first of these is called Box's conservative 
estimate. Box (1954) suggested that a conservative approach is to take 9 as its minimum 
possible value. That is, take 9=l/(p-l). This is probably too conservative to be 
recommended. 
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A second possibility was proposed by Greenhouse and Geisser (1959). Let Q = CEC' 
where C is any (p - 1) x p matrix satisfying Cj = 0 and CC = / / ,_ 1 where 7 is a p x 1 vector 
of l's and I x is the (p-l)x(p-l) identity matrix. Then 


e = 


fr-i V 


Vw J 


v -1 p-i 

(p - 


i=l H 


h(Q )] 2 

(P - 1) tr(QQ') 


is the Greenhouse and Geisser (G-G) estimate of 0 where tr(B) is the trace of the matrix B. 

A third method of estimating 0 was suggested by Huynh and Feldt (1976). Their approach 
estimates 0by 


N(p-1)0- 2 
(p-l)(N-r-(p-l)0) 


where N = the total sample size and N — r = degrees of freedom for error in a split-plot-in- 
time analysis. 

It can be noted that Box's correction is the most conservative of the three adjustment 
methods and that the H-F correction is the least conservative of the three. It should also be 
noted that, if any of the estimates of 0 are greater than 1 , then 1 is substituted for 0 in 
Equations 27.13 and 27.14 when computing adjusted p-values because one would not want 
to increase the degrees of freedom for any effect. 

It can be noted that if Z satisfies the H-F conditions, then CEC' = XI for some X. When 
this is true, one says that CEC satisfies a sphericity condition. A likelihood ratio test of 
Hy. CEC' = XI is available. The likelihood ratio test statistic is given by 


A = 


\Q\ 


1 

P~ 1 


ir-i 


MQ) 


(27.15) 


and one rejects H 0 if -2 log e (A) > Za,p(p-i)/ 2 -i- There are many C matrices that satisfy Cj = 0 
and CC = I p _ v It should be noted that the value of A does not depend on which possible C 
matrix might be selected. 

As an example, consider the data the Table 27.1. For these data 


0.237 

0.171 

0.162 

0.228 

0.129 

0.171 

0.247 

0.163 

0.231 

0.135 

0.162 

0.163 

0.268 

0.303 

0.184 

0.228 

0.231 

0.303 

0.392 

0.241 

0.129 

0.135 

0.184 

0.241 

0.247 


and is based on 12 degrees of freedom. 
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Therefore 


Taking 


one gets 





'0.237 

0.171 

0.162 

0.228 

0.129" 





0.171 

0.247 

0.163 

0.231 

0.135 


1 - 


1 







— E 

— - 


0.162 

0.163 

0.268 

0.303 

0.184 


12 

12 










0.228 

0.231 

0.303 

0.392 

0.241 





0.129 

0.135 

0.184 

0.241 

0.247 




'0.0198 

0.0143 

0.0135 

0.0190 

0.0108' 



0.0143 

0.0206 

0.0136 

0.0193 

0.0113 


= 

0.0135 

0.0136 

0.0223 

0.0253 

0.0153 



0.0190 

0.0193 

0.0253 

0.0327 

0.0201 



0.0108 

0.0113 

0.0153 

0.0201 

0.0206 


V2 

V2 

0 

0 

0 

1 

1 

2 

0 

0 

V6 

V6 

V6 

1 

1 

1 

3 

0 

Vl2 

Vl2 

Vl2 

Vl2 

1 

1 

1 

1 

4 

V20 

V20 

V20 

V20 

V20_ 



' 0.00592 

-0.00019 

-0.00003 

0.00013' 


-0.00019 

0.00830 

0.00399 

0.00178 


-0.00003 

0.00399 

0.00486 

0.00078 


0.00013 

0.00178 

0.00078 

0.00875 


One then gets $ = 0.795 and 6= 1.747. The value of the likelihood ratio test statistic for test¬ 
ing H 0 : CXC'= At is 


A = 


\Q\ 


p - 1 


MQ) 


p -1 


1.20895(10)" 
0.027831V 
4 J 


= 0.5159 


Thus -2 log e (A) = 1.32, which would be compared to Xo. 05,9 ~ 16.919. Thus H 0 : CXC' = XI 
cannot be rejected for these data. 

Suppose the data in Table 27.1 are analyzed using a split-plot-in-time analysis using 
SAS-GLM procedure. The SAS commands used are given in Table 27.7. 
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TABLE 27.7 

SAS-GLM Code to Analyze the Data in Table 27.1 Using a Split-Plot-In-Time Analysis 

DAT A LAI2; SET LAI; 

DROP LAI1-LAI5; 

TIME=1J LAI=LAI1; OUTPUT; 

TIME=2; LAI=LAI2; OUTPUT; 

TIME=3; LAI=LAI3; OUTPUT; 

TIME=4; LAI=LAI4; OUTPUT; 

TIME=5 ; LAI=LAI 5 ; OUTPUT ; 

ODS RTF FILE='C:\TEMP.RTF'; 

PROC GL M DATA=LAI2; 

CLASSES FERTILIZER VARIETY TIME; 

MODEL LAI=FERTILIZER VARIETY FERTILIZER*VARIETY TIME FERTILIZER*TIME 
VARIETY*TIME; 

RANDOM FERTILIZER*VARIETY/TEST; 

RUN; 


TABLE 27.8 

Split-Plot-In-Time Tests for Main Effects 


Source 

df 

Type III SS 

Mean Square 

F-Value 

Pr>F 

Fertilizer 

4 

33.267126 

8.316781 

94.36 

<0.0001 

Variety 

3 

23.282507 

7.760836 

88.05 

<0.0001 

Error 

12 

1.057658 

0.088138 



Error: MS (Fertilizer x Variety) 





The tests for Fertilizer and Variety main effects are given in Table 27.8 and the tests for the 
Time main effect, as well as the Time x Fertilizer and Time x Variety interaction effects are 
given in Table 27.9 along with G-G adjusted degrees of freedom. The adjusted degrees 
of freedom were computed from the split-plot-in-time values by multiplying each of the 
original degrees of freedom by 6. The tests in Table 278 need no adjustment. Note that the 
F statistics in Table 278 agree with the MANOVA test statistics for Variety and Fertilizer 
main effects given earlier. Even with the adjusted degrees of freedom being used, all sig¬ 
nificance probabilities in Table 27.9 are still less than 0.0001. 

The SAS-GLM procedure with its Repeated option can also be used to produce the 
analyses described in this section as well as some of the results of a MANOVA analysis. 
Table 2710 gives the SAS commands that can be used for such an analysis of the data in 
Table 27.1. 

Table 27.11, in the row labeled "Orthogonal components," gives the test for H 0 : C1C' = A I. 
That is, this is the test of whether the H-F conditions are satisfied or not. Table 2712 gives 
the MANOVA test for comparing Time main effects. Table 27.13 gives the MANOVA tests 
for Time x Variety interaction and Time x Fertilizer interaction, and Table 27.14 gives the tests 
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for Variety and Fertilizer main effects. Finally, Table 27.15 gives the split-plot-in-time 
analyses of Time main effects as well as the Time x Variety and Time x Fertilizer interactions. 
Three different sets of p-values are given. They are unadjusted split-plot-in-time p-values, 
Greenhouse-Geisser (G-G) adjusted p-values, and Huyhn-Feldt (H—F) adjusted p-values. 
The values of 6= 0.7954 and (9= 1.7473 are also given. 


TABLE 27.9 

Split-Plot-in-Time Tests with G-G Adjusted Degrees of Freedom 


Source 

df (Ad) df) 

Type III SS 

Mean Square 

F-Value 

Pr>F 

Time 

4 (3.18) 

20.478356 

5.119589 

738.20 

<0.0001 

Fertilizer x time 

16 (12.72) 

0.674324 

0.042145 

6.08 

<0.0001 

Variety x time 

12 (9.54) 

1.246268 

0.103856 

14.98 

<0.0001 

Error: MS(Error) 

48 (38.16) 

0.332892 

0.006935 




TABLE 27.10 

SAS-GLM Code to Analyze the Data in Table 27.1 Using the Repeated Option 

PROC GLM DATA=LAI; 

CLASSES FERTILIZER VARIETY; 

MODEL LAI1-LAI5=FERTILIZER VARIETY/NOUNI; 

REPEATED TIME 5 (12345) /SUMMARY PRINTE; 

RUN ; 


TABLE 27.11 

Test for H 0 : CXC' = II 


Sphericity Tests 

Variables 

df 

Mauchly's Criterion 

Chi-Square 

Pr > Chi-Square 

Transformed variates 

9 

0.1009828 

23.883383 

0.0045 

Orthogonal components 

9 

0.5208938 

6.7938446 

0.6586 


TABLE 27.12 

MANOVA Test for Time Main Effect 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Time Effect 
H = Type III SSCP Matrix for Time E = Error SSCP Matrix 
S — 1 M = 1 N = 3.5 

Statistic Value F-Value Num df Den df Pr > F 

Wilks's lambda 0.00502375 445.62 4 9 <0.0001 



Analysis of Repeated Measures Experiments When the Ideal Conditions Are Not Satisfied 553 


TABLE 27.13 


MANOVA Test for Interactions between Time and Fertilizer and Variety 


MANOVA Test Criteria and F Approximations for 
H = Type III SSCP Matrix for Time x Fertilizer E = 
S =4 M = -0.5 N = 3.5 

the Hypothesis of No Time X Fertilizer Effect 
Error SSCP Matrix 


Statistic 

Value F-Value 

Num df 

Den df 

Pr>F 

Wilks' lambda 

0.10535731 1.91 

16 

28.133 

0.0640 

MANOVA Test Criteria and F Approximations for the Hypothesis of No Time X Variety Effect 

H = Type III SSCP Matrix for Time x Variety E = Error SSCP Matrix 

S = 3 M = 0 N = 3.5 


Statistic 

Value F-Value 

Num df 

Den df 

Pr>F 

Wilks's lambda 

0.01425985 8.00 

12 

24.103 

<0.0001 

TABLE 27.14 





Tests for Fertilizer and Variety Main Effects 




Source 

df Type III SS 

Mean Square 

F-Value 

Pr>F 

Fertilizer 

4 33.26712600 

8.31678150 

94.36 

<0.0001 

Variety 

3 23.28250700 

7.76083567 

88.05 

<0.0001 

Error 

12 1.05765800 

0.08813817 




TABLE 27.15 


Split-Plot-In-Time Analysis with Adjusted p-Values 


Source 

df 

Type III SS 

Mean Square 

F-Value 

Pr > F 

Adj Pr > F 

G-G H-F 

Time 

4 

20.47835600 

5.11958900 

738.20 

<0.0001 

<0.0001 

<0.0001 

Time x fertilizer 

16 

0.67432400 

0.04214525 

6.08 

<0.0001 

<0.0001 

<0.0001 

Time x variety 

12 

1.24626800 

0.10385567 

14.98 

<0.0001 

<0.0001 

<0.0001 

Error (time) 

48 

0.33289200 

0.00693525 





Greenhouse-Geisser epsilon 

0.7954 




Huynh-Feldt epsilon 


1.7473 





27.4 Mixed Model Methods 

Consider once again the notation described in Section 27.1 where 

3 1 ilk 

V ilk 

V ipk 


Vik = 
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is the vector of responses for subject k in treatment group i. Suppose that the error vector 
e' ik ~ independent N(0, Zf z = l, 2, k = 1, 2,..., w, (27.16) 

Let y be a vector that contains all of the data vectors. That is. 


y' = Wn y'n ••• y'n, y'n y'n ••• j/L 2 
Under the conditions in Equation 27.16, 


y'n y'n y'n ••• y'n ,] 


V = Cov(y) 



0 

... o 

0 

0 

... o 

... o 

0 

... o ' 

0 


... o 

0 

0 

... o 

... o 
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... o 

0 

0 

- 

0 

0 

... o 

... o 

0 

... o 

0 

0 

... o 


0 

0 0 

... o 

0 

... o 

0 

0 

... o 

0 


... o 

... o 

0 

... o 



... o 

0 

0 





0 

0 

... o 

0 

0 

- ^ 

... o 

0 

... o 

0 

0 

... o 

0 

0 

... o 

- Zt 

0 

... o 

0 

0 

... o 

0 

0 

... o 

... o 

z 

... o 

0 

0 

... o 

0 

0 

... o 

... o 

0 

- ^ . 


Suppose y~N(Xp, V) or equivalently, y = Xfi + e where e~N( 0, V). Suppose for 
now that V is known and that If 3 is estimable. In this case, it can be shown that the best 
estimate of /'j3is l'ft,, where 0 V = (X'V 'X) X'V 'y. Furthermore, the standard error of l'0 v 
can be shown to be equal to t'fX'V^XY l. The estimator l'0 v is called a generalized least 
squares estimator of If3. Unfortunately, V is rarely known. But suppose V could be 
estimated by V, then f//3 could be estimated by l'j3 v where 0 V = (X'V _1 X) X'V ] y, and 
an estimate of the standard error of f'0 v could be taken as sx.(lfBy) = l'(X'V^X) The 
estimator If3 V is called an estimated generalized least squares estimator of If3. An 
approximate f-statistic whose degrees of freedom must also be approximated that tests 
If8 = 0 is given by 


t = 




ifx'v-'xy 


(27.17) 


Suppose one wants to test H 0 : ff/J = OvsH-Hp ^ 0 where H is a q x p matrix of rank q. An 
approximate F-statistic that can be used to test H 0 is given by 


(. h$ v )' [mxr'xy h'J\h$.) 

1 


(27.18) 
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The numerator degrees of freedom for this F-statistic is given by q and the denominator 
degrees of freedom must be approximated. Approximation methods are beyond the scope 
of this book, but two methods can be recommended. If there are no missing data values, 
a method known as Satterthwaite's method can be used. This is a generalization of 
Satterthwaite's method discussed in Chapter 2. See Giesbrecht and Burns (1985), McLean 
and Sanders (1988), and Fai and Cornelius (1996) for more details. When there are missing 
values, the method known as the Kenward-Roger's method should be used. See Kenward 
and Roger (1997) for more details. 

Mixed-model procedures can be used to obtain estimates of the X, from which V can be 
estimated. The methods of estimation that are used are likelihood methods based on the 
assumptions in Equation 27.16, that is, on y having a multivariate normal distribution. 
Such methods are beyond the scope of this book except that some general comments will 
be made later. The estimate of X, depends on whether X, has any structure. Various struc¬ 
tures can be considered. The more popular structures include a compound symmetry 
structure, a Hyuhn-Feldt structure, an AR(1) structure, heterogeneous compound sym¬ 
metry, and heterogeneous AR(1). Mixed-model programs can also handle an unstructured 
case where X, has no structure at all. Once V is estimated, the mixed model procedures can 
compute I'Pv and its estimated standard error. Table 27.16 gives some of the more popular 
covariance structures for repeated measures. The structures in Table 27.16 assume that 
each of the t treatments has the same parameters in its covariance structure. That is, it 
assumes that all of the X, are equal. It is also possible to estimate covariance parameters 
separately for each treatment. 

Summarizing, the repeated measures model is 

y ijk = p + a, + Tj + ^ + e* ijkr z = l,2,...,f; j = 1,2,, p; k = l,2,...,n t (27.19) 

where 


e 


* 

ilk 


£ 


* 

ik 


£ 


* 

ilk 



is the vector of errors for subject k in treatment group i and where £* k ~ independent N(0, Z), 
i = l,2,... / t;k = l,2,...,n i . Additional covariance structures can be obtained by adding a ran¬ 
dom subject component to the model in Equation 27.19. Such a model would be given by 

y tjk = p+ a, + 8, k + Tj + Yij + £**, z = 1, 2,..., f; j = 1,2,...,p- k= 1, 2,..., zz, (27.20) 

where the 8 ik ~ i.i.d. N( 0, of), i = 1,2,... ,t;k = 1,2,..., n t . 

Adding a random subject component to the model changes the covariance structure of 
the repeated measures to 


f 

w 

\ 


y iik 


V 

y ipk 

/ 


Cov(t/ tt ) = Cov 


<*ll v +X, 
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TABLE 27.16 


Some Popular Covariance Structures for Repeated Measures Experiments 


Compound symmetry structure 
(two parameters) 


Hyuhn-Feldt structure (p +1 parameters) 


AR(1) structure (two parameters) 


Unstructured [p(p + l)/2 parameters] 


Heterogeneous compound symmetry 
structure (p + 1 parameters) 


Heterogeneous AR(1) structure 
(p +1 parameters) 



T 

P ■ 

■ p" 

Z t = CT 

P 

1 ■ 

■ P 


P 

P ■ 

■■ 1 



o 2 +2ri l rp+p. 

■■■ Pl+Pp 


z t = 

Pi+rh ct 2 +2/j 2 

■■■ Ul+rlp 

for i = 1,2, ..., t 


ri,, + ri i Vp + rii 

••• c 2 +2 rip 




1 p p 2 

1 

o p - 1_ 




p 1 p 

... t 

o ”- 2 


Zi = 

o 2 

p 2 p 1 

... ( 

o ”~ 3 

for i = 1 , 2,..., t 



pP - 1 pf -2 pP -3 


1 



°11 °12 ‘ ‘' p 




Zi = 

(J 2 1 (T 22 • • • (J 2 p 

for i = 

: 1,2 

,... ,t 


Gp\ Gp 2 ‘ ‘ ‘ Opp 





( 

T 2 po-,cr 2 

PWp 



z~ 

P<J 2 (J 1 (T 2 

pOlOp 

for ; = 1,2,..., f 


PG p G 1 P®p ®2 

a p 





a 2 PCTjCT, 

p 2 c i«r 3 

p'-'o.Op 


po 1 o l <J 2 

po 2 o$ 

p'- 2 o 2 o p 

z t = 

P p& 3&2 

e 2 


p r ~ 3 o 3 <j p 


P p 

P"~\o 2 

p'’-\° 3 



for i = 1,2, ... ,t 


where J p is a p x p matrix of Is. Adding a random subject component has no effect 
when satisfies compound symmetry or when E f has the unstructured form, but add¬ 
ing a random subject component could be very useful for many other covariance 
structures. 

Two methods that have been used to fit models such as those in Equations 27.16 and 27.17 
are discussed. One is to find the maximum likelihood estimates (ML) of the model param¬ 
eters, and the other finds restricted maximum likelihood estimates (REML) of the model 
parameters. Both of these methods are numerically intensive and fitting models by these 
methods is beyond the scope of this book. Nevertheless, some of the issues involved with 
these two methods are discussed next. 
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27.4.1 Maximum Likelihood Method 

One can form a likelihood function based on the distribution of y, and consider the likeli¬ 
hood as a function of the parameters in the covariance matrix V and the fixed effect para¬ 
meters in p. Then one finds values of these parameters that maximize the likelihood 
function over the parameter space. This method is numerically intensive, the resulting 
estimators may not give rise to unbiased estimates of parameter functions that may be of 
interest such as P'l 3, distributional properties of the estimators of parameter functions 
may not be known except asymptotically, solving the likelihood equations requires an 
iterative process that may or may not converge and, even when it converges, it may converge 
at a local maxima rather than a global maximum. The greatest disadvantage of the ML 
method is that it tends to underestimate the variance-covariance parameters, which 
results in estimated standard errors of estimators of fixed effects that are too small. This 
leads to type I error rates that are much higher than desired, and confidence intervals that 
do not achieve the desired confidence levels. See Chapter 22 for a discussion of a two-way 
mixed model. 

27.4.2 Restricted Maximum Likelihood Method 

The restricted maximum likelihood method is also numerically intensive and distribu¬ 
tional properties of the estimates are not known except asymptotically. However, it is 
preferred over the ML method because the resulting estimators of parameter functions of 
interest have less bias, and the REML method does not underestimate the variance- 
covariance parameters nearly as much as the ML method. This leads to estimated stan¬ 
dard errors of estimators of fixed effects that are more appropriate which leads to type I 
error rates that are more desirable, and confidence intervals that tend to be closer to the 
desired confidence levels. 

Consider the matrix form of the model, y = X/5 + e where e ~ N( 0, V). Let L be a full row 
rank matrix that satisfies LX = 0 and such that rank(L) = n-rank(X) where n = the dimen¬ 
sion of y. 

Let y* = Ly. Then y* ~ N( 0, LVL). The likelihood function formed from y* depends only 
on the variance-covariance parameters. The REML estimates of the variance-covariance 
parameters are the values of the parameters that maximize the restricted likelihood 
function based on the distribution of y*. Once estimates of the variance-covariance 
parameters are found, V can be estimated, and then test statistics can be computed from 
Equations 27.17 and 27.18. 

As a strategy for using mixed model methods, it is generally recommended that one 
determine an appropriate structure for the variance-covariance matrix of the repeated 
measures using REML, and once a covariance matrix structure is obtained, then one can 
consider inferences about estimable functions of the fixed effect parameters. See Chapter 
22 for a discussion of REML for a two-way mixed model. Some of the possibilities for 
choosing a covariance matrix structure for the repeated measures will be considered next. 

Statistical software such as the SAS-Mixed procedure compute several statistics that 
provide useful information that can help one choose a covariance structure for the repeated 
measures. The first such statistic was suggested by Akaike (1974) and is known as Akaiki's 
Information Criterion (AIC). A second statistic was suggested by Schwarz (1978) and is 
known as Schwarz's Bayesian Criterion (BIC). A third possibility was given by Hurvich 
and Tsai (1989) and is generally denoted AICC. Each of these criterion is a function of 
-2 log e (L) where L is the maximum of the restricted maximum likelihood function. The 
criteria differ only in the way that -2 log e (L) is penalized as the number of parameters in 
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the covariance structure increases. For each of these three criteria, smaller is considered to 
be better. Let d = the number of covariance parameters in the structure being considered, 
N = the number of subjects, and N* - total number of observation minus the rank(X). Then 
each of these criteria is defined by: 


AIC = -2 log e (L) + 2d 
BIC = -2 log e (L) + d[log e (N)] 


(27.21) 


and 


AICC = -2 log e (L) + 2d[N*/(N* -d- 1)] 

When samples sizes are relatively large, one can also compute a likelihood ratio test 
(LRT) statistic that can be used to compare two covariance structures for the repeated 
measures whenever the structure under the null hypothesis is a special case of the struc¬ 
ture under the alternative. For example, LRT statistics can be obtained that would compare 
a compound symmetry structure with an Fl-F structure or with an unstructured struc¬ 
ture, and a LRT statistic could be obtained that would compare an H-F structure or an 
AR(1) structure with an unstructured structure, but one cannot get an LRT statistic to 
compare a compound symmetry structure or an H-F structure to an AR(1) structure since 
neither one is a special case of the other. In order to compare a compound symmetry struc¬ 
ture or an H-F structure with an AR(1) structure, one would need to rely upon the criteria 
in Equation 27.21. 

As an example, consider once again the Drug data in Table 26.4. Various analyses of 
these data will be obtained using the SAS-Mixed procedure. The basic SAS commands 
that can be used are shown in Table 27.17. The basic model is a two-way model with effects 
Drug, Time, and Drug x Time interaction. The Repeated statement indicates that Time is a 
repeated factor, the Type = CS option indicates that the model being fitted assumes the 
covariance matrix of the repeated measures has compound symmetry, and the 
Subject = Person option indicates that Person is a variable in the data set that identifies 
those observations which make up the repeated measures for a particular subject. Note 
that for these data the Person variable takes on the unique values of 1-24, one value for 
each subject. Caution: If one had let the Person variable have values 1-8 for each of the three 
drug groups, then one would have had to use Subject = Person(Drug) in order to identify 
each subject correctly. Finally, the R option is used so that the covariance matrix for 
Person = 1 will be provided in the output. Note that this analysis assumes that all 24 sub¬ 
jects have the same covariance matrix, Z, so only the first one needs be printed. If one 
desired the covariance matrices for subjects 1, 9, and 17, then one would use R = 1, 9, 17 
instead of just R. 

The approach that is used in this example is to refit models assuming various covariance 
structures in order to select the covariance structure that will be used when considering 
inferences on the Drug and Time factors. Consequently, the data in this example was 
reanalyzed using each of the Repeated statements given in Table 27.18. In each analysis, the 
statement "ODS RTF SELECT R;" was used to get each of the R matrices placed in the RTF 
output file, and the "ODS OUTPUT FITSTATISTICS = FIT1;" statement was used to create 
a data set that contains the values of various fit statistics for each analysis. The fit statistics 
obtained are AIC, AICC, and BIC as well as the value -2 log e (L) for each analysis. 
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TABLE 27.17 

SAS-Mixed Code to Analyze the Data in Table 26.4 Using the Repeated Option 


DATA HRT_RATE; 

INPUT DRUG $ PERSON HR1-HR4 @@; 
CARDS; 


A 

1 

72 

86 

81 

77 

B 

2 

85 

86 

83 

80 

c 

3 

69 

73 

72 

74 

A 

4 

78 

83 

88 

81 

B 

5 

82 

86 

80 

84 

c 

6 

66 

62 

67 

73 

A 

7 

71 

82 

81 

75 

B 

8 

71 

78 

70 

75 

c 

9 

84 

90 

88 

87 

A 

10 

72 

83 

83 

69 

B 

11 

83 

88 

79 

81 

c 

12 

80 

81 

77 

72 

A 

13 

66 

79 

77 

66 

B 

14 

86 

85 

76 

76 

c 

15 

72 

72 

69 

70 

A 

16 

74 

83 

84 

77 

B 

17 

85 

82 

83 

80 

c 

18 

65 

62 

65 

61 

A 

19 

62 

73 

78 

70 

B 

20 

79 

83 

80 

81 

c 

21 

75 

69 

69 

68 

A 

22 

69 

75 

76 

70 

B 

23 

83 

84 

78 

81 

c 

24 

71 

70 

65 

63 


DATA HR; SET HRT_RATE; DROP HR1-HR4; 

TIME = 1 ; HR = HR1; OUTPUT; TIME = 2 ; HR = HR2 ; OUTPUT; TIME = 3 ; HR = HR3 ; 
OUTPUT; 

TIME = 4 ; HR = HR4 ; OUTPUT; 


TITLE Analyses of Drug Data using the MIXED Prodedure'; 

ODS RTF FILE = 'TEMP.RTF' ; 

PROC MIXED DATA = HR; 

CLASSES DRUG TIME PERSON; 

MODEL HR = DRUG TIME DRUG*TIME ; 

REPEATED TIME/TYPE=CS SUBJECT=PERSON R; 

ODS RTF SELECT R; 

ODS OUTPUT FITSTATISTICS = FIT1; 

TITLE2 'Repeated Measures Analysis - Assuming Compound Symmetry'; 

RUN; 

DATA FIT1; SET FIT1; TYPE= ' CS 

RUN; 

ODS RTF CLOSE; 


TABLE 27.18 


Covariance Structures Considered with Different Repeated Options 


REPEATED 

TIME/TYPE=CS 

SUBJECT=PERSON 

R 




REPEATED 

TIME/TYPE=HF 

SUBJECT=PERSON 

R 




REPEATED 

TIME/TYPE=UN 

SUBJECT=PERSON 

R 




REPEATED 

TIME/TYPE=AR(1) 

SUBJECT=PERSON 

R 




REPEATED 

TIME/TYPE=CSH 

SUBJECT=PERSON 

R 




REPEATED 

TIME/TYPE=ARH(1) 

SUBJECT=PERSON 

R 




REPEATED 

TIME/TYPE=CS 

SUBJECT=PERSON 

R= 

=1, 9, 

17 

GROUP=DRUG; 

REPEATED 

TIME/TYPE=HF 

SUBJECT=PERSON 

R= 

=1, 9, 

17 

GROUP=DRUG; 

REPEATED 

TIME/TYPE=UN 

SUBJECT=PERSON 

R= 

=1, 9, 

17 

GROUP=DRUG; 

REPEATED 

TIME/TYPE=AR(1) 

SUBJECT=PERSON 

R= 

=1, 9, 

17 

GROUP=DRUG; 
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The first six of the Repeated statements in Table 27.18 correspond to the covariance 
structures given in Table 27.16. The last four Repeated statements include a Group = Drug 
option, which causes a different covariance matrix to be fitted for each Drug. The first 
person in each drug group is person numbers 1,9, and 17, and the R = 1,9,17 option places 
those estimated covariance matrices in the output. 

Table 27.19 gives the estimated covariance matrix of the repeated measures when assum¬ 
ing compound symmetry. Note that a 1 = 33.4182 and 


25.9702 

33.4182 


0.777 


Table 27.20 gives the estimated covariance matrix assuming the H-F conditions. 
Here 


6 2 + 24 = 28.5296 
d 2 +24 = 40.2442 

and 


4 + 4 = 26.9390 

Solving these simultaneously for a 1 , 4 , and 4 gives 

d 2 = 7.4479, 4 = 10.5409, and 4=16.3982 


TABLE 27.19 


Z for Compound Symmetry (CS) 


Estimated R Matrix for Person 1 



Row 

Coll 

Col2 

Col3 

Col4 

1 

33.4182 

25.9702 

25.9702 

25.9702 

2 

25.9702 

33.4182 

25.9702 

25.9702 

3 

25.9702 

25.9702 

33.4182 

25.9702 

4 

25.9702 

25.9702 

25.9702 

33.4182 


TABLE 27.20 

Z for H-F Conditions 

Estimated R Matrix for Person 1 

Row Coll Col2 

Col3 

Col4 

1 

28.5296 

26.9390 

25.8070 

22.1733 

2 

26.9390 

40.2442 

31.6643 

28.0307 

3 

25.8070 

31.6643 

37.9802 

26.8987 

4 

22.1733 

28.0307 

26.8987 

30.7130 
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From these, and the other two diagonal elements of £, one can get 

fj 3 = 15.2671, and fj 4 = 11.6326 

Table 27.21 gives the estimated covariance matrix assuming an unstructured covariance 
matrix. This is the same estimate that one gets from a SAS-GLM analysis when one includes 
the MANOVA/PRINTE; option. 

Table 27.22 gives the estimated covariance matrix assuming an AR(1) structure. Here 
(7~ = 32.4945 and p = 26.7309/32.4945 = 0.8226. 

Table 27.23 gives the estimated covariance matrix assuming a heterogeneous compound 
symmetry structure. In this case, one can get 

of = 31.1038, 
a\ = 38.6218, 

<f 3 2 = 29.3752, 
a 4 — 34.7840 


and 


a _ 27.0585 

P ~ 731.1038 • 38.6218 


0.7807 


TABLE 27.21 


£ for Unstructured Case 


Estimated R Matrix for Person 1 



Row 

Coll 

Col2 

Col3 

Col4 

1 

30.5238 

28.6548 

25.4881 

20.0952 

2 

28.6548 

39.2321 

29.3095 

25.5476 

3 

25.4881 

29.3095 

31.2321 

26.7262 

4 

20.0952 

25.5476 

26.7262 

32.6845 


TABLE 27.22 

£ for an AR(1) Structure 


Estimated R Matrix for Person 1 



Row 

Coll 

Col2 

Col3 

Col4 

1 

32.4945 

26.7309 

21.9896 

18.0893 

2 

26.7309 

32.4945 

26.7309 

21.9896 

3 

21.9896 

26.7309 

32.4945 

26.7309 

4 

18.0893 

21.9896 

26.7309 

32.4945 
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TABLE 27.23 


£ for a Compound Symmetry Structure with 
Heterogeneous Variances 


Estimated R Matrix for Person 1 



Row 

Coll 

Col2 

Col3 

Col4 

1 

31.1038 

27.0585 

23.5982 

25.6790 

2 

27.0585 

38.6218 

26.2959 

28.6146 

3 

23.5982 

26.2959 

29.3752 

24.9552 

4 

25.6790 

28.6146 

24.9552 

34.7840 


TABLE 27.24 


£ for an AR(1) Structure with Heterogeneous Variances 


Estimated R Matrix for Person 1 



Row 

Coll 

Col2 

Col3 

Col4 

1 

30.7872 

29.0118 

21.4829 

18.3212 

2 

29.0118 

39.3259 

29.1204 

24.8346 

3 

21.4829 

29.1204 

31.0182 

26.4531 

4 

18.3212 

24.8346 

26.4531 

32.4516 


Table 27.24 shows the results from a heterogeneous AR(1) structure. Here 

a\ = 30.7872 
6\ = 39.3259 
dj = 31.0182 
6\ = 32.4256 


and 


A _ 29.0118 

P ~ ^30.7872 • 39.3256 


0.8338 


Tables 27.25-2728 give the estimated covariance matrices when assuming a different 
covariance matrix for each Drug. Table 27.25 is for compound symmetry structures. 
Table 27.26 is for H-F structures. Table 27.27 is for the unstructured structures, and Table 
2728 is for AR(1) structures. Finding estimates of each of the individual covariance para¬ 
meter is left for the reader to do. 

Table 27.29 shows the values of the AIC for each of the covariance structures considered 
above. One can see that the minimum value for AIC is 488.603 and this minimum occurs 
for the AR(1) structure. Thus under AIC, the AR(1) structure would be the best covariance 
structure to choose. 
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TABLE 27.25 


Three Estimated Covariance Matrices for Each Drug under Compound Symmetry 


Estimated R Matrix for Person 1 




Row 

Coll 

Col2 

Col3 

Col4 

1 

21.5313 

15.8333 

15.8333 

15.8333 

2 

15.8333 

21.5313 

15.8333 

15.8333 

3 

15.8333 

15.8333 

21.5313 

15.8333 

4 

15.8333 

15.8333 

15.8333 

21.5313 

Estimated R Matrix for Person 9 




Row 

Coll 

Col2 

Col3 

Col4 

1 

63.9063 

53.2679 

53.2679 

53.2679 

2 

53.2679 

63.9063 

53.2679 

53.2679 

3 

53.2679 

53.2679 

63.9063 

53.2679 

4 

53.2679 

53.2679 

53.2679 

63.9063 

Estimated R Matrix for Person 17 




Row 

Coll 

Col2 

Col3 

Col4 

1 

14.8170 

8.8095 

8.8095 

8.8095 

2 

8.8095 

14.8170 

8.8095 

8.8095 

3 

8.8095 

8.8095 

14.8170 

8.8095 

4 

8.8095 

8.8095 

8.8095 

14.8170 


TABLE 27.26 


Three Estimated Covariance Matrices for Each Drug under H-F Conditions 


Estimated R Matrix for Person 1 




Row 

Coll 

Col2 

Col3 

Col4 

1 

31.2012 

20.4843 

18.0424 

22.7955 

2 

20.4843 

21.1632 

13.0234 

17.7765 

3 

18.0424 

13.0234 

16.2794 

15.3346 

4 

22.7955 

17.7765 

15.3346 

25.7857 

Estimated R Matrix for Person 9 




Row 

Coll 

Col2 

Col3 

Col4 

1 

44.8443 

63.7376 

48.8587 

46.8783 

2 

63.7376 

103.91 

78.3905 

76.4101 

3 

48.8587 

78.3905 

74.1499 

61.5311 

4 

46.8783 

76.4101 

61.5311 

70.1891 

Estimated R Matrix for Person 17 




Row 

Coll 

Col2 

Col3 

Col4 

1 

16.6470 

8.1668 

11.6811 

6.7731 

2 

8.1668 

11.7014 

9.2083 

4.3003 

3 

11.6811 

9.2083 

18.7300 

7.8146 

4 

6.7731 

4.3003 

7.8146 

8.9140 
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TABLE 27.27 


Three Estimated Covariance Matrices for Each Drug under Unstructured Conditions 


Estimated R Matrix for Person 1 




Row 

Coll 

Col2 

Col3 

Col4 

1 

24.0000 

17.0000 

16.7143 

19.5000 

2 

17.0000 

20.0000 

12.2857 

13.5000 

3 

16.7143 

12.2857 

16.0000 

16.0000 

4 

19.5000 

13.5000 

16.0000 

26.1250 

Estimated R Matrix for Person 9 




Row 

Coll 

Col2 

Col3 

Col4 

1 

43.9286 

57.9643 

44.5714 

35.4286 

2 

57.9643 

88.2679 

68.2143 

57.8571 

3 

44.5714 

68.2143 

60.0000 

55.5714 

4 

35.4286 

57.8571 

55.5714 

63.4286 

Estimated R Matrix for Person 17 




Row 

Coll 

Col2 

Col3 

Col4 

1 

23.6429 

11.0000 

15.1786 

5.3571 

2 

11.0000 

9.4286 

7.4286 

5.2857 

3 

15.1786 

7.4286 

17.6964 

8.6071 

4 

5.3571 

5.2857 

8.6071 

8.5000 


TABLE 27.28 

Three Estimated Covariance Matrices for Each Drug under AR(1) Conditions 


Estimated R Matrix for Person 1 




Row 

Coll 

Col2 

Col3 

Col4 

1 

23.0054 

17.6563 

13.5509 

10.4001 

2 

17.6563 

23.0054 

17.6563 

13.5509 

3 

13.5509 

17.6563 

23.0054 

17.6563 

4 

10.4001 

13.5509 

17.6563 

23.0054 

Estimated R Matrix for Person 9 




Row 

Coll 

Col2 

Col3 

Col4 

1 

57.3490 

50.6923 

44.8082 

39.6071 

2 

50.6923 

57.3490 

50.6923 

44.8082 

3 

44.8082 

50.6923 

57.3490 

50.6923 

4 

39.6071 

44.8082 

50.6923 

57.3490 

Estimated R Matrix for Person 17 




Row 

Coll 

Col2 

Col3 

Col4 

1 

15.1400 

9.6852 

6.1957 

3.9634 

2 

9.6852 

15.1400 

9.6852 

6.1957 

3 

6.1957 

9.6852 

15.1400 

9.6852 

4 

3.9634 

6.1957 

9.6852 

15.1400 
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TABLE 27.29 

AIC for Each Covariance Structure 


T yP e 

Criterion 

Fitstat 

CS 

AIC (smaller is better) 

492.797 

HF 

AIC (smaller is better) 

497.034 

UN 

AIC (smaller is better) 

497.372 

AR(1) 

AIC (smaller is better) 

488.603 

CSH 

AIC (smaller is better) 

497.514 

ARH(l) 

AIC (smaller is better) 

492.767 

CS, Heterogenous Groups 

AIC (smaller is better) 

492.735 

HF, Heterogenous Groups 

AIC (smaller is better) 

500.601 

UN, Heterogenous Groups 

AIC (smaller is better) 

511.261 

AR(1), Heterogenous Group 

AIC (smaller is better) 

490.948 


TABLE 27.30 

AICC for Each Covariance Structure 

Type 

Criterion 

Fitstat 

CS 

AICC (smaller is better) 

492.945 

HF 

AICC (smaller is better) 

497.804 

UN 

AICC (smaller is better) 

500.386 

AR(1) 

AICC (smaller is better) 

488.751 

CSH 

AICC (smaller is better) 

498.283 

ARH(l) 

AICC (smaller is better) 

493.537 

CS, Heterogenous Groups 

AICC (smaller is better) 

493.826 

HF, Heterogenous Groups 

AICC (smaller is better) 

507.660 

UN, Heterogenous Groups 

AICC (smaller is better) 

546.355 

AR(1), Heterogenous Group 

AICC (smaller is better) 

492.039 


Table 27.30 shows the values of the AICC for each of the covariance structures considered 
above. One can see that the minimum value for AICC is 488.751 and this minimum also 
occurs for the AR(1) structure. Thus under AICC, the AR(1) structure would be the best 
covariance structure to choose. 

Table 27.31 shows the values of the BIC for each of the covariance structures considered 
above. One can see that the minimum value for BIC is 490.959 and this minimum occurs 
for the AR(1) structure. Thus under BIC, the AR(1) structure would be the best covariance 
structure to choose. So under each of the three criteria, AIC, AICC, and BIC, the preferred 
covariance structure for the repeated measures is AR(1). 

Table 27.32 gives the values of -21og e (L) for each covariance structure considered in this 
example. Suppose one wants to use a likelihood ratio test to compare the compound 
symmetry structure to the H-F structure. This can be done since compound symmetry is 
a special case of the H-F structure. That is, suppose we want to test H 0 : 21 has CS structure 
vs H a : E has H-F structure. The value of the LRT statistic is the difference in their respec¬ 
tive -2 log e (L) values. Under H 0 , this result is an approximate chi-square test statistic with 
degrees equal to the difference in the number of parameters in each covariance structure. 
The CS structures has two parameters when and the H-F structure has five parameters 
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TABLE 27.31 

BIC for Each Covariance Structure 


Type 

Criterion 

Fitstat 

CS 

BIC (smaller is better) 

495.153 

HF 

BIC (smaller is better) 

502.925 

UN 

BIC (smaller is better) 

509.153 

AR(1) 

BIC (smaller is better) 

490.959 

CSH 

BIC (smaller is better) 

503.404 

ARH(l) 

BIC (smaller is better) 

498.658 

CS, Heterogenous Groups 

BIC (smaller is better) 

499.803 

HF, Heterogenous Groups 

BIC (smaller is better) 

518.272 

UN, Heterogenous Groups 

BIC (smaller is better) 

546.602 

AR(1), Heterogenous Group 

BIC (smaller is better) 

498.016 


TABLE 27.32 

Values of -21og e (L) for Each Covariance Structure 

Type 

Criterion 

Fitstat 

CS 

-2 Res Log Likelihood 

488.797 

HF 

-2 Res Log Likelihood 

487.034 

UN 

-2 Res Log Likelihood 

477.372 

AR(1) 

-2 Res Log Likelihood 

484.603 

CSH 

-2 Res Log Likelihood 

487.514 

ARH(l) 

-2 Res Log Likelihood 

482.767 

CS, Heterogenous Groups 

-2 Res Log Likelihood 

480.735 

HF, Heterogenous Groups 

-2 Res Log Likelihood 

470.601 

UN, Heterogenous Groups 

-2 Res Log Likelihood 

451.261 

AR(1), Heterogenous Group 

-2 Res Log Likelihood 

478.948 


when p = 4. Thus the chi-square test statistic is % 2 = 488.797 - 487.034 = 1.763 with 5-2 = 3 
degrees of freedom. The resulting observed significance level is a = 0.6230, and H 0 cannot 
be rejected. 

As a second example using a LRT, consider comparing the AR(1) structure to the ARH(l) 
structure. Then number of parameters in these two structures are 2 and 5, respectively, 
when p = 4. Here % 2 = 484.603 - 482.767 = 1.836 with 3 degrees of freedom. The chi-square 
critical point is £ 0 . 05,3 = 7.815 and since 1.836 < 7.815, one cannot reject the AR(1) structure in 
favor of the ARH(l) structure. 

As a third example, consider comparing the AR(1) structure to UN structure. The num¬ 
ber of parameters in the AR(1) structure is 2, and the number of parameters in the UN 
structure is p(p + l)/2 = 10 when p = 4. Here £ 2 = 484.603 - 477.372 = 7.231 with 10 - 2 = 8 
degrees of freedom. The resulting observed significance level is a = 0.5119, and one cannot 
reject an AR(1) structure in favor of the UN structure. 
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The three tests above are possible because one covariance structure is a special case of 
the other. One cannot give a LRT that compares AR(1) to CS since neither one is a special 
case of the other. Likewise one can not get a LRT that compares ARH(l) to CSH. 

Before considering tests about the fixed effects, suppose one fits a model where the cova¬ 
riance structure is AR(1), but a random component corresponding to subject is added to 
the model. That is, the model in Equation 27.20 is fitted where it is assumed the repeated 
measures satisfy an AR(1) structure. This is equivalent to choosing a covariance structure 
for the repeated measures that is equal to 
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Table 27.33 gives the SAS-Mixed commands used to fit a model with AR(1) structure on 
the repeated measures and with a random subject effect, and Table 27.34 gives the values 
of the fit statistic for the analysis using the commands in Table 27.33. Note that the values 
of AIC, AICC, and BIC are all larger than they were for an AR(1) structure model without 
a random subject effect. Thus for these drug data, the addition of a random subject term 
cannot be recommended. In addition, the LRT statistic comparing the model with a ran¬ 
dom subject term to the model without a random subject term is % 2 = 484.6 - 483.7 = 0.9. 
Adding a random subject term to the model increases the number of parameters by 1. 


TABLE 27.33 

SAS-Mixed Commands for Adding a Random Subject Effect to a Repeated 
Measures Experiment 

PROC MIXED DATA=HR MAXITER=200; 

CLASSES DRUG TIME PERSON; 

MODEL HR=DRUG TIME DRUG*TIME; 

RANDOM PERSON; 

REPEATED TIME/TYPE=AR(1) SUBJECT=PERSON R; 

TITLE2 'Repeated Measures Analysis - Assuming Autogregressive 
Errors'; 

ODS RTF SELECT R FITSTATISTICS COVPARMS; 

RUN; 


TABLE 27.34 


Fit Statistics for a Repeated Measures 
Model with a Random Subject Effect 


Fit Statistics 

-2 Res Log Likelihood 

483.7 

AIC (smaller is better) 

489.7 

AICC (smaller is better) 

490.0 

BIC (smaller is better) 

493.2 
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Therefore x 2 = 0-9 would be compared with a chi-square critical point with 1 degree of 
freedom. This critical point is Xo.os.i = 3-84, and adding a random subject term to the model 
does not give a significantly better fit. 

Next the data are analyzed to consider inferences about the Drug and Time effects. The 
SAS-Mixed commands are shown in Table 27.35. The values of the test statistics given by 
Equation 27.18 are shown in Table 27.36 for the Drug and Time main effects and the 
Drug x Time interaction. It is noted that the Drug x Time interaction effect is highly signifi¬ 
cant. Hence the two-way Drug x Time means should be compared with one another within 
each drug group and within each time level. The least squares means are shown in Table 
27.37 and pairwise comparisons for drugs within each time level are given in Table 27.38 
and pairwise comparisons among time levels for each drug are given in Table 27.39. 


TABLE 27.35 

Final SAS-Mixed Commands for Analyzing Fixed Effects 

PROC MIXED DATA=HR MAXITER=200; 

CLASSES DRUG TIME PERSON; 

MODEL HR=DRUG TIME DRUG*TIME/DDFM=KR; 

REPEATED TIME/TYPE=AR(1) SUBJECT=PERSON; 

TITLE2 'Repeated Measures Analysis - Assuming Autogregressive 
Errors'; 

LSMEANS DRUG TIME DRUG*TIME/PDIFF ADJUST=TUKEY; 

ODS OUTPUT DIFFS=DIFFS; 

RUN ; 

PROC PRI NT DATA=DIFFS; WHERE EFFECT=’DRUG*TIME' AND DRUG=_DRUG; 
TITLE 'TIME DIFFERENCES FOR EACH DRUG'; 

RUN; 

PROC PRI NT DATA=DIFFS; WHERE EFFECT='DRUG*TIME' AND TIME=_TIME; 
TITLE 'DRUG DIFFERENCES AT EACH LEVEL OF TIME'; 

RUN; 


TABLE 27.36 

Type III Tests on Fixed Effects 


Type III Tests of Fixed Effects 




Effect 

Num df 

Den df 

F-Value 

Pr > F 

Drug 

2 

22.8 

6.52 

0.0058 

Time 

3 

63.1 

15.43 

<0.0001 

Drug x time 

6 

63.3 

12.71 

<0.0001 


27.5 Summary 

This chapter considered three alternative methods that can be used to analyze repeated 
measures experiments when the split-plot-in-time analysis is not appropriate. The 
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TABLE 27.37 


Least Squares Means for Each Effect in the Model 


Least Squares Means 

Effect 

Drug 

Time 

Estimate 

Standard Error 

df 

t- Value 

Pr > | f | 

Drug 

A 


76.2813 

1.7870 

22.8 

42.69 

<0.0001 

Drug 

B 


81.0313 

1.7870 

22.8 

45.34 

<0.0001 

Drug 

C 


71.9062 

1.7870 

22.8 

40.24 

<0.0001 

Time 


1 

75.0000 

1.1636 

33.7 

64.46 

<0.0001 

Time 


2 

78.9583 

1.1636 

33.7 

67.86 

<0.0001 

Time 


3 

77.0417 

1.1636 

33.7 

66.21 

<0.0001 

Time 


4 

74.6250 

1.1636 

33.7 

64.13 

<0.0001 

Drug x time 

A 

1 

70.5000 

2.0154 

33.7 

34.98 

<0.0001 

Drug x time 

A 

2 

80.5000 

2.0154 

33.7 

39.94 

<0.0001 

Drug x time 

A 

3 

81.0000 

2.0154 

33.7 

40.19 

<0.0001 

Drug x time 

A 

4 

73.1250 

2.0154 

33.7 

36.28 

<0.0001 

Drug x time 

B 

1 

81.7500 

2.0154 

33.7 

40.56 

<0.0001 

Drug x time 

B 

2 

84.0000 

2.0154 

33.7 

41.68 

<0.0001 

Drug x time 

B 

3 

78.6250 

2.0154 

33.7 

39.01 

<0.0001 

Drug x time 

B 

4 

79.7500 

2.0154 

33.7 

39.57 

<0.0001 

Drug x time 

C 

1 

72.7500 

2.0154 

33.7 

36.10 

<0.0001 

Drug x time 

C 

2 

72.3750 

2.0154 

33.7 

35.91 

<0.0001 

Drug x time 

C 

3 

71.5000 

2.0154 

33.7 

35.48 

<0.0001 

Drug x time 

C 

4 

71.0000 

2.0154 

33.7 

35.23 

<0.0001 


TABLE 27.38 

Comparisons between Drug Means for Each Time Level with Tukey-Kramer Adjusted 

Significance Levels 

Effect 

Drug Time 

Drug 

Time 

Estimate 

Standard 

Error 

df 

f-Value 

Pr t 

Adjustment 

Adj p 

Drug x time 

A 

1 

B 

1 

-11.2500 

2.8502 

33.7 

-3.95 

0.0004 

Tukey-Kramer 

0.0100 

Drug x time 

A 

1 

C 

1 

-2.2500 

2.8502 

33.7 

-0.79 

0.4354 

Tukey-Kramer 

0.9997 

Drug x time 

A 

2 

B 

2 

-3.5000 

2.8502 

33.7 

-1.23 

0.2280 

Tukey-Kramer 

0.9846 

Drug x time 

A 

2 

C 

2 

8.1250 

2.8502 

33.7 

2.85 

0.0074 

Tukey-Kramer 

0.1846 

Drug x time 

A 

3 

B 

3 

2.3750 

2.8502 

33.7 

0.83 

0.4106 

Tukey-Kramer 

0.9995 

Drug x time 

A 

3 

C 

3 

9.5000 

2.8502 

33.7 

3.33 

0.0021 

Tukey-Kramer 

0.0586 

Drug x time 

A 

4 

B 

4 

-6.6250 

2.8502 

33.7 

-2.32 

0.0263 

Tukey-Kramer 

0.4710 

Drug x time 

A 

4 

C 

4 

2.1250 

2.8502 

33.7 

0.75 

0.4611 

Tukey-Kramer 

0.9998 

Drug x time 

B 

1 

C 

1 

9.0000 

2.8502 

33.7 

3.16 

0.0033 

Tukey-Kramer 

0.0915 

Drug x time 

B 

2 

C 

2 

11.6250 

2.8502 

33.7 

4.08 

0.0003 

Tukey-Kramer 

0.0066 

Drug x time 

B 

3 

C 

3 

7.1250 

2.8502 

33.7 

2.50 

0.0175 

Tukey-Kramer 

0.3591 

Drug x time 

B 

4 

C 

4 

8.7500 

2.8502 

33.7 

3.07 

0.0042 

Tukey-Kramer 

0.1129 
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TABLE 27.39 


Comparisons between Time Means for Each Drug with Tukey-Kramer Adjusted Significance Levels 


Effect 

Drug Time Drug 

Time 

Estimate 

Standard 

Error 

df 

t -Value 

Pr t 

Adjustment 

Adj p 

Drug x time 

A 

1 

A 

2 

-10.0000 

1.2315 

62.7 

-8.12 

<0.0001 

Tukey-Kramer 

<0.0001 

Drug x time 

A 

1 

A 

3 

-10.5000 

1.6645 

72.1 

-6.31 

<0.0001 

Tukey-Kramer 

<0.0001 

Drug x time 

A 

1 

A 

4 

-2.6250 

1.9503 

79.3 

-1.35 

0.1822 

Tukey-Kramer 

0.9693 

Drug x time 

A 

2 

A 

3 

-0.5000 

1.2315 

62.7 

-0.41 

0.6861 

Tukey-Kramer 

1.0000 

Drug x time 

A 

2 

A 

4 

7.3750 

1.6645 

72.1 

4.43 

<0.0001 

Tukey-Kramer 

0.0021 

Drug x time 

A 

3 

A 

4 

7.8750 

1.2315 

62.7 

6.39 

<0.0001 

Tukey-Kramer 

<0.0001 

Drug x time 

B 

1 

B 

2 

-2.2500 

1.2315 

62.7 

-1.83 

0.0725 

Tukey-Kramer 

0.7970 

Drug x time 

B 

1 

B 

3 

3.1250 

1.6645 

72.1 

1.88 

0.0645 

Tukey-Kramer 

0.7681 

Drug x time 

B 

1 

B 

4 

2.0000 

1.9503 

79.3 

1.03 

0.3083 

Tukey-Kramer 

0.9965 

Drug x time 

B 

2 

B 

3 

5.3750 

1.2315 

62.7 

4.36 

<0.0001 

Tukey-Kramer 

0.0026 

Drug x time 

B 

2 

B 

4 

4.2500 

1.6645 

72.1 

2.55 

0.0128 

Tukey-Kramer 

0.3278 

Drug x time 

B 

3 

B 

4 

-1.1250 

1.2315 

62.7 

-0.91 

0.3645 

Tukey-Kramer 

0.9987 

Drug x time 

C 

1 

C 

2 

0.3750 

1.2315 

62.7 

0.30 

0.7617 

Tukey-Kramer 

1.0000 

Drug x time 

C 

1 

C 

3 

1.2500 

1.6645 

72.1 

0.75 

0.4551 

Tukey-Kramer 

0.9998 

Drug x time 

C 

1 

c 

4 

1.7500 

1.9503 

79.3 

0.90 

0.3723 

Tukey-Kramer 

0.9989 

Drug x time 

C 

2 

c 

3 

0.8750 

1.2315 

62.7 

0.71 

0.4800 

Tukey-Kramer 

0.9999 

Drug x time 

C 

2 

c 

4 

1.3750 

1.6645 

72.1 

0.83 

0.4115 

Tukey-Kramer 

0.9995 

Drug x time 

C 

3 

c 

4 

0.5000 

1.2315 

62.7 

0.41 

0.6861 

Tukey-Kramer 

1.0000 


methods considered were a MANOVA approach, an adjusted p-value approach, and a 
mixed model approach. Which approach one chooses to use may depend on the statisti¬ 
cal software one has available to use. When one has mixed model software available, 
then this approach is likely the easiest one to use. If one has trouble getting convergence 
with mixed model procedures, then one can consider the MANOVA approach if one has 
statistical software that allows one to use the MANOVA methods. If software is not 
available for either of these two approaches, then one can consider using the adjusted 
p-value approach. 


27.6 Exercises 

27.1 Consider the rabbit data described in Exercise 26.1. 

1) Are the Hyuhn-Feldt conditions satisfied for these data? Justify your 
answer. 

2) What is the G-G estimate of Box's 9 defined in Equation 27.12? What is the 
H-F estimate of 9 ? 

Answer the following questions by treating the repeated measures as a 
multivariate response vector. 

3) Is there a significant Time x Trt interaction? Why or why not? 
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4) Are there differences between the three treatments? Are there time differ¬ 
ences? Explain your answers. 

5) Which treatment seems to have the greatest affect on ear temperature 
difference? 

27.2 Consider the rabbit data described in Exercise 26.1. 

1) Based on the AIC criterion, which of the following covariance structures 
would be selected: compound symmetry, heterogeneous compound symme¬ 
try, AR(1), heterogeneous AR(1), or unstructured. 

2) Based on the BIC criterion, which of the following covariance structures 
would be selected: compound symmetry, heterogeneous compound symme¬ 
try, AR(1), heterogeneous AR(1), or unstructured. 

3) Based on the AICC criterion which of the following covariance structures 
would be selected: compound symmetry, heterogeneous compound symme¬ 
try, AR(1), heterogeneous AR(1), or unstructured. 

4) Using likelihood ratio tests compare: 

a) Compound symmetry to heterogenous compound symmetry. 

b) Compound symmetry to unstructured. 

c) Heterogeneous AR(1) to AR(1). 

27.3 Consider the rabbit data described in Exercise 26.1. Answer questions 3-5 in 
Exercise 27.1 using a mixed model procedure assuming the covariance structure 
that you would recommend for these data. 

27.4 What covariance structure for the repeated measures would you recommend for 
the data in Exercise 26.3? Justify your answer. 





Case Studies: Complex Examples Having 
Repeated Measures 


This chapter considers several examples and their statistical analyses using mixed model 
procedures. The examples considered include split-plot designs with repeated measures, 
repeated measures nested within repeated measures, and a multilocation study. 


28.1 Complex Comfort Experiment 

An engineer had three environments in which to test two types of clothing. Since responses 
to an environment also differ between males and females, sex of person was included as 
a factor. Four people (two males and two females) were put into an environmental cham¬ 
ber (which was assigned one of the three environments). One male and one female wore 
clothing type 1, and the other male and female wore clothing type 2. The comfort score of 
each person was recorded at the end of 1, 2, and 3 hours. The data for this experiment are 
shown in Table 28.1. 

There are three sizes of experimental units. The largest experimental unit is a chamber 
or, equivalently, a group of four people. The chamber experimental unit experimental 
design is a one-way treatment structure (Environment) in a completely randomized 
design structure with three replications at each level of the environment. The middle- 
sized experimental unit is a person. The experimental design for a person is a two-way 
treatment structure (Sex x Clothing) in a randomized complete block design structure in 
nine blocks (each block contains four experimental units, people). The smallest experi¬ 
mental unit is a 1 h time interval (Hour), which is a repeated measure. The experimental 
design for Hour is a one-way treatment structure (Time) in a randomized complete 
block design structure in 36 blocks (each block contains three experimental units 
(1 h time intervals). 
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TABLE 28.1 


Data for Complex Comfort Experiment 


Env 

Rep 

Sex 

Clo 

Score 1 

Score 2 

Score 3 

1 

1 

1 

1 

13.9001 

7.5154 

10.9742 

1 

1 

1 

2 

18.3941 

12.4152 

15.2242 

1 

1 

2 

1 

10.0149 

3.7669 

7.0326 

1 

1 

2 

2 

16.4774 

10.4104 

13.1143 

1 

2 

1 

1 

15.7185 

9.7170 

12.5080 

1 

2 

1 

2 

19.7547 

13.5293 

16.5487 

1 

2 

2 

1 

10.6902 

4.8473 

7.9829 

1 

2 

2 

2 

17.1147 

11.3858 

14.1503 

1 

3 

1 

1 

14.9015 

9.1825 

11.5819 

1 

3 

1 

2 

18.0402 

12.1005 

15.4893 

1 

3 

2 

1 

10.1944 

4.1716 

6.9688 

1 

3 

2 

2 

16.0789 

10.2357 

12.4853 

2 

1 

1 

1 

10.2881 

6.9090 

8.4138 

2 

1 

1 

2 

13.8631 

10.1492 

12.5372 

2 

1 

2 

1 

6.1634 

2.2837 

4.0052 

2 

1 

2 

2 

13.0291 

9.7775 

11.6576 

2 

2 

1 

1 

11.9904 

8.4793 

9.8695 

2 

2 

1 

2 

14.8587 

11.0317 

12.8317 

2 

2 

2 

1 

6.7562 

2.5634 

4.7547 

2 

2 

2 

2 

13.8977 

9.6643 

11.6034 

2 

3 

1 

1 

9.7589 

6.1772 

8.0785 

2 

3 

1 

2 

13.5513 

9.3052 

11.5259 

2 

3 

2 

1 

4.5203 

0.5913 

2.8939 

2 

3 

2 

2 

12.5057 

7.7502 

10.5226 

3 

1 

1 

1 

7.4205 

7.1752 

7.1218 

3 

1 

1 

2 

12.5410 

11.9157 

12.2239 

3 

1 

2 

1 

3.8293 

3.5868 

3.3004 

3 

1 

2 

2 

11.0002 

11.0282 

10.5662 

3 

2 

1 

1 

11.8158 

11.9721 

11.7187 

3 

2 

1 

2 

16.4418 

16.6355 

16.6686 

3 

2 

2 

1 

7.5707 

7.3456 

7.2404 

3 

2 

2 

2 

13.5421 

13.5672 

14.0240 

3 

3 

1 

1 

7.2364 

7.8304 

7.4147 

3 

3 

1 

2 

12.0690 

12.5003 

12.1790 

3 

3 

2 

1 

1.8330 

1.6769 

1.8065 

3 

3 

2 

2 

9.4934 

9.7000 

10.0119 


The model for this experiment (where the model is separated into three parts corre¬ 
sponding to the three sizes of experimental units and where the repeated measures are 
treated as a split-plot-in-time factor) is 

yijkmt = [M + £< + V.tl + [S ; + C* + (SC) ik + ( ES)ij + ( EC) ik + ( ESC) ijk + 8 ijke ] + [T m + ( ET) im 

+ (ST)jm + ( CT) km + ( SCT) jkm + (EST) ijm + (ECT) ikm + (ESCT) ijkm + e ijkm ,] (28.1) 
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where E, denotes the zth environment, S ; denotes the /th sex, C k denotes the /cth clothing, 
and T m denotes the ruth time point, i) u denotes a random chamber effect with the assump¬ 
tion that rj if ~ i.i.d. N( 0, <7^), 8 ljkf denotes a random person effect with the assumption that 
8jj k , ~ N( 0, (jl), and £, lkm , denotes the random measurement error for a given hour with the 
assumption that £ ijkmt ~ i.i.d. N( 0, <J 2 e ). It is also assumed that all random effects are inde¬ 
pendently distributed. Also note that the first bracket of Equation 28.1 is the Chamber part 
of the model, the second bracket is the Person part of the model, and the third bracket is 
the Hour part of the model. If the split-plot-in-time assumption was not appropriate, then 
the model would be 


3 lijkmt -R + E; + 'hr + Sj + C k + ( SC) jk + ( ES ) lj + (EC) ik + ( ESC) ijk + T m + ( ET) im + (ST) jm 

+ (CT) bn + (SCT) jkm + (EST) tjm + (ECT) ikm + (ESCT) ijkm + e ijkm( ( 28 . 2 ) 


where it is assumed that rj i( ~ i.i.d. N( 0, <J 2 ) and 
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and the analysis would depend on the structure assumed for Z. Note that the model in 
Equation 28.2 is the same as the model in (28.1) except that the random effect correspond¬ 
ing to Person, 8 i]k( , has been removed from the model. One might want to include Person 
as a random effect for some Z such as when 23 has an AR(1) structure. 

Should one wish to perform a split-plot-in-time analysis, the basic SAS®-Mixed Code 
that can be used is shown in Table 28.2. The results of such an analysis will be left for the 
reader to pursue. Such an analysis assumes the experimental design is a split-split-plot 
design with Chamber as the whole plot experimental unit. Person as the subplot experi¬ 
mental unit, and Time measurement as sub-sub-plot experimental unit. 

The approach that will be taken here is to treat the Time factor as a repeated measure, 
and then consider various covariance structures for the repeated measures. The design then 
would be described as a split-plot design with repeated measures on the sub-plot factor 
(Person). The basic SAS-Mixed code that one would use when assuming compound sym¬ 
metry for the covariance structure of the repeated measures is shown in Table 28.3. Note 
that the combination of Sex x Clo x Rep x Env is unique for each person, so this effect can 
be used to identify a person on whom the repeated measures occur. 


TABLE 28.2 

SAS-Mixed Code to Analyze the Data in Table 28.1 Treating 
the Repeated Measures Factor as a Split-Plot-In-Time Factor 

PROC MIXED DATA=COMFORT ; 

TITLE 'SPLIT-PLOT-IN-TIME ANALYSIS'; 

CLASSES ENV REP SEX CLO TIME; 

MODEL SCORE=ENV|SEX|CLO|TIME/DDFM=SATTERTH; 

RANDOM REP(ENV) SEX*CLO*REP(ENV); 

LSMEANS ENV|SEX|CLO|TIME/PDIFF; 

RUN; 
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TABLE 28.3 

SAS-Mixed Code to Analyze the Data in Table 28.1 Using the Repeated 
Option with a Compound Symmetry Covariance Structure 

PROC MIXED DATA=COMFORT; 

TITLE 'ANALYSIS ASSUMES A SPLIT-PLOT DESIGN WITH REPEATED'; 

TITLE2 'MEASURES ON THE SUBPLOT FACTOR'; 

CLASSES ENV REP SEX CLO TIME; 

MODEL SCORE=ENV|SEX|CLO|TIME/DDFM=SATTERTH; 

RANDOM REP(ENV); 

REPEATED TIME/SUBJECT=SEX*CLO*REP*ENV TYPE=CS; 

LSMEANS ENV|SEX|CLO|TIME/PDIFF; 

RUN; 


Four structures for the covariance matrix of the repeated measures are considered for 
these data. The covariance structures considered are: compound symmetry, AR(1), an 
unstructured covariance matrix, and person was also included as a random effect with a 
covariance structure for the repeated measures being AR(1). For the first three of these, 
one just needs to change the "Type =" statement in the Repeated option in Table 28.3. To 
also include Person as a random factor one would use the random and repeated options 
given below: 

RANDOM REP(ENV) SEX*CLO*REP*ENV; 

REPEATED TIME/SUBJECT=SEX*CLO*REP*ENV TYPE=AR(1); 

The resulting fit statistics for each of the four covariance structures being considered are 
given in Table 28.4. 


TABLE 28.4 

Fit Statistics for Four Covariance Structures 


Description 

Value 

Type 

-2 Res Log Likelihood 

117.7 

CS 

-2 Res Log Likelihood 

119.4 

Ar(l) 

-2 Res Log Likelihood 

107.5 

UN 

-2 Res Log Likelihood 

117.6 

Ar(l) with random person error 

AIC (smaller is better) 

123.7 

CS 

AIC (smaller is better) 

125.4 

Ar(l) 

AIC (smaller is better) 

121.5 

UN 

AIC (smaller is better) 

125.6 

Ar(l) with random person error 

AICC (smaller is better) 

124.1 

CS 

AICC (smaller is better) 

125.8 

Ar(l) 

AICC (smaller is better) 

123.3 

UN 

X 2 = 117.7 - 107.5 = 10.2 with 4 degrees 
of freedom AICC (smaller is better) 

126.2 

Ar(l) with random person error 

BIC (smaller is better) 

124.3 

CS 

BIC (smaller is better) 

126.0 

Ar(l) 

BIC (smaller is better) 

122.9 

UN 

BIC (smaller is better) 

126.4 

Ar(l) with random person error 
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TABLE 28.5 

SAS-Mixed Code to Analyze the Data in Table 28.1 Using the Repeated 
Option with a Unstructured Covariance Structure 

PROC MIXED DATA=COMFORT; 

TITLE 'ANALYSIS ASSUMES A SPLIT-PLOT DESIGN WITH REPEATED’; 
TITLE2 'MEASURES ON THE SUBPLOT FACTOR’; 

CLASSES ENV REP SEX CLO TIME; 

MODEL SCORE=ENV|SEX|CLO|TIME/DDFM=SATTERTH; 

RANDOM REP(ENV); 

REPEATED TIME/SUBJECT=SEX*CLO*REP*ENV TYPE=UN; 

LSMEANS ENV|SEX|CLO|TIME/PDIFF; 

RUN; 


An examination of the AIC, AICC, and BIC fit statistics in Table 28.4 reveals that each 
has their minimum for the unstructured covariance matrix assumption. The chi-square 
statistic for comparing compound symmetry to unstructured is £- = 117.7 - 107.5 = 10.2 
with 6-2 = 4 degrees of freedom. The corresponding observed significance level is 
a = 0.0372. Thus unstructured is significantly better than compound symmetry. The rest 
of the analysis of this example will assume an unstructured covariance matrix for the 
repeated measures. The SAS-Mixed code that produces an analysis of fixed effects table is 
given in Table 28.5. The covariance parameter estimates for the analysis described in Table 
28.5 are shown in Table 28.6. From Table 28.6, one can see that dy ? = 2.4047 and 


1 = 


0.2157 0.2256 0.1248 
0.2256 0.3525 0.1894 
0.1248 0.1894 0.1462 


The analysis of fixed effects table is given in Table 28.7. An examination of Table 28.7 
reveals that the significant effects (a < 0.05) are: Sex, Clo, Env x Clo, Sex x Clo, Env x Sex x Clo, 
Time, and Env x Time. Since the three-way interaction, Env x Sex x Clo is significant, one 
should examine interesting pairwise comparisons among these three-way means. The only 
interaction involving Time that is significant is the Env x Time interaction, so one should 
also examine pairwise comparisons among these two-way means. In addition to the 


TABLE 28.6 


Covariance Parameter Estimates for the Analysis in Table 28.5 


Covariance Parameter Estimates 


Covariance Parameter 

Subject 

Estimate 

Rep(Env) 


2.4047 

UN(1,1) 

Env x Rep x Sex x Clo 

0.2157 

UN(2,1) 

Env x Rep x Sex x Clo 

0.2256 

UN(2,2) 

Env x Rep x Sex x Clo 

0.3525 

UN(3,1) 

Env x Rep x Sex x Clo 

0.1248 

UN(3,2) 

Env x Rep x Sex x Clo 

0.1894 

UN(3,3) 

Env x Rep x Sex x Clo 

0.1462 
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TABLE 28.7 

Analysis of Fixed Effects 
Type III Tests of Fixed Effects 


Effect 

Num df 

Den df 

F-Value 

Pr>F 

Env 

2 

6.11 

3.25 

0.1091 

Sex 

1 

18 

484.00 

<0.0001 

Env x Sex 

2 

18 

0.65 

0.5320 

Clo 

1 

18 

1347.89 

<0.0001 

Env x Clo 

2 

18 

3.60 

0.0484 

Sex x Clo 

1 

18 

93.59 

<0.0001 

Env x Sex x Clo 

2 

18 

3.69 

0.0455 

Time 

2 

24 

1667.76 

<0.0001 

Env x Time 

4 

24 

478.22 

<0.0001 

Sex x Time 

2 

24 

0.66 

0.5248 

Env x Sex x Time 

4 

24 

0.75 

0.5666 

Clo x Time 

2 

24 

0.70 

0.5059 

Env x Clo x Time 

4 

24 

1.12 

0.3688 

Sex x Clo x Time 

2 

24 

1.62 

0.2197 

Env x Sex x Clo x Time 

4 

24 

0.73 

0.5811 


pairwise comparisons described above, the analysis also considers linear and quadratic 
comparisons among the time means for each environment. This is done for illustration 
purposes here, and readers will need to determine for themselves whether such compari¬ 
sons are of interest to them for a study similar to this one. The additional SAS-Mixed 
statements one will need to append to those in Table 28.5 are shown in Table 28.8. The 
results from these additional statements are shown in Tables 28.9-28.16. 

Table 28.9 contains linear and quadratic contrasts in time averaged over the four combi¬ 
nations of Sex x Clothing and Reps for each environment. An examination of Table 28.9 
reveals that both of the linear and quadratic contrasts are statistically significant for envi¬ 
ronments 1 and 2, and that neither is significant for environment 3. 

Table 28.10 contains the Environment x Sex x Clothing least squares means. These means 
are averaged over reps and time. Table 28.11 contains the Environment x Time least 
squares means, and these means are averages over Reps, Clothing, and Sex. There are 12 
Environment x Sex x Clothing means. If one were to make all pairwise comparisons among 
these twelve means, it would require 66 comparisons. However, many of these pairwise 
comparisons are not interesting to an experimenter. The ones that would be of interest 
would generally be those that compare possibilities for one of the three factors with one 
another for each combination of the other two factors. That is, one would usually want to 
compare clothing types for each combination of Environment x Sex (six comparisons), 
sexes for each combination of Environment x Clothing (six comparisons), and environments 
for each combination of Sex x Clothing (12 comparisons). Thus there are a total of 24 com¬ 
parisons out of the 66 possible that are likely to be interesting to an experimenter. The 
results of these comparisons are shown in Tables 28.12-28.14, respectively 

An examination of the comparisons in Table 28.12 reveals that clothing types are signifi¬ 
cantly different for each combination of Environment x Sex, and an examination of Table 28.13 
indicates that males are significantly different from females for each Environment x Clothing 



Case Studies: Complex Examples Having Repeated Measures 


579 


TABLE 28.8 

Additional SAS-Mixed Statements to Obtain Results for Comparisons of Interest 

LSMEANS ENV*SEX*CLO ENV*TIME/PDIFF; 

ODS LISTING EXCLUDE LSMEANS DIFFS; 

ODS OUTPUT LSMEANS=LSMS DIFFS=LSMDIFFS; 

ODS RTF EXCLUDE LSMEANS DIFFS; 


ESTIMATE 

'LINEAR 

TIME 

FOR 

ENV 

1' 

TIME 

-1 

0 

1 

ENV*TIME 

-1 

0 

1 

0 

0 

0 

0 

0 

0 

ESTIMATE 

'LINEAR 

TIME 

FOR 

ENV 

2' 

TIME 

-1 

0 

1 

ENV*TIME 

0 

0 

0 

-1 

0 

1 

0 

0 

0 

ESTIMATE 

'LINEAR 

TIME 

FOR 

ENV 

3' 

TIME 

-1 

0 

1 

ENV*TIME 

0 

0 

0 

0 

0 

0 

-1 

0 

1 

ESTIMATE 

'QUAD 

TIME 

FOR 

ENV 

1 ' 

TIME 

1 

-2 

1 

ENV*TIME 

1 

-2 

1 

0 

0 

0 

0 

0 

0 

ESTIMATE 

'QUAD 

TIME 

FOR 

ENV 

2 ' 

TIME 

1 

-2 

1 

ENV*TIME 

0 

0 

0 

1 

-2 

1 

0 

0 

0 

ESTIMATE 

'QUAD 

TIME 

FOR 

ENV 

3 ' 

TIME 

1 

-2 

1 

ENV*TIME 

0 

0 

0 

0 

0 

0 

1 - 

-2 

1 


RUN; 

PROC PRINT DATA=LSMS; WHERE EFFECT='ENV*SEX*CLO'; 

TITLE4 'ENV*SEX*CLO LEAST SQUARES MEANS' ; 

RUN; 

PROC PRINT DATA=LSMS; WHERE EFFECT='ENV*TIME' ; 

TITLE4 'ENV*TIME LEAST SQUARES MEANS'; 

RUN; 

PROC PRI NT DATA=LSMDIFFS; WHERE EFFECT= 1 ENV*SEX*CLO' AND ENV=_ENV AND SEX=_SEX; 
TITLE4 'CLOTHING DIFFERENCES FOR EACH COMBINATON OF ENVIRONMENT*SEX' ; 

RUN; 

PROC PRI NT DATA=LSMDIFFS; WHERE EFFECT='ENV*SEX*CLO' AND ENV=_ENV AND CLO=_CLO; 
TITLE4 'SEX DIFFERENCES FOR EACH COMBINATON OF ENVIRONMENT*CLOTHING'; 

RUN; 

PROC PRI NT DATA=LSMDIFFS; WHERE EFFECT='ENV*SEX*CLO' AND SEX=_SEX AND CLO=_CLO; 
TITLE4 'ENVIRONMENT DIFFERENCES FOR EACH COMBINATON OF CLOTHING*SEX'; 

RUN; 

PROC PRI NT DATA=LSMDIFFS; WHERE EFFECT='ENV*TIME' AND TIME=_TIME; 

TITLE4 'ENVIRONMENT DIFFERENCES FOR EACH HOUR'; 

RUN; 

PROC PRI NT DATA=LSMDIFFS; WHERE EFFECT='ENV*TIME' AND ENV=_ENV; 

TITLE4 'TIME DIFFERENCES FOR EACH ENVIRONMENT'; 

RUN; 


TABLE 28.9 

Linear and Quadratic Contrasts in Time for Each Environment 


Estimates 

Label 

Estimate Standard Error df 

f-Value 

Pr> |f| 

Linear time for Env 1 

-3.1016 

0.09673 

24 

-32.06 

<0.0001 

Linear time for Env 2 

-1.8741 

0.09673 

24 

-19.37 

<0.0001 

Linear time for Env 3 

-0.04309 

0.09673 

24 

-0.45 

0.6600 

Quad time for Env 1 

8.8988 

0.1735 

24 

51.28 

<0.0001 

Quad time for Env 2 

5.8761 

0.1735 

24 

33.86 

<0.0001 

Quad time for Env 3 

-0.06653 

0.1735 

24 

-0.38 

0.7048 
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TABLE 28.10 


Three-Way Least Squares Means for Environment, Sex, and Clothing 


Effect 

Env 

Sex 

Clo 

Estimate 

Standard Error 

df 

t- Value 

Pr> |f| 

Env x Sex x Clo 

1 

1 

1 

11.7777 

0.9317 

6.86 

12.64 

<0.0001 

Env x Sex x Clo 

1 

1 

2 

15.7218 

0.9317 

6.86 

16.87 

<0.0001 

Env x Sex x Clo 

1 

2 

1 

7.2966 

0.9317 

6.86 

7.83 

0.0001 

Env x Sex x Clo 

1 

2 

2 

13.4947 

0.9317 

6.86 

14.48 

<0.0001 

Env x Sex x Clo 

2 

1 

1 

8.8849 

0.9317 

6.86 

9.54 

<0.0001 

Env x Sex x Clo 

2 

1 

2 

12.1838 

0.9317 

6.86 

13.08 

<0.0001 

Env x Sex x Clo 

2 

2 

1 

3.8369 

0.9317 

6.86 

4.12 

0.0047 

Env x Sex x Clo 

2 

2 

2 

11.1565 

0.9317 

6.86 

11.97 

<0.0001 

Env x Sex x Clo 

3 

1 

1 

8.8562 

0.9317 

6.86 

9.51 

<0.0001 

Env x Sex x C/o 

3 

1 

2 

13.6861 

0.9317 

6.86 

14.69 

<0.0001 

Env x Sex x C/o 

3 

2 

1 

4.2433 

0.9317 

6.86 

4.55 

0.0028 

Env x Sex x Clo 

3 

2 

2 

11.4370 

0.9317 

6.86 

12.28 

<0.0001 


TABLE 28.11 

Two-Way Least Squares Means for Environment and Time 

Least Squares Means 









Effect 

Env 

Sex 

Clo 

Time 

Estimate Standard Error 

df 

f-Value 

Pr> |f| 

Env x Time 

1 



1 

15.1066 

0.9053 

6.13 

16.69 

<0.0001 

Env x Time 

1 



2 

9.1065 

0.9116 

6.3 

9.99 

<0.0001 

Env x Time 

1 



3 

12.0050 

0.9021 

6.04 

13.31 

<0.0001 

Env x Time 

2 



1 

10.9319 

0.9053 

6.13 

12.08 

<0.0001 

Env x Time 

2 



2 

7.0568 

0.9116 

6.3 

7.74 

0.0002 

Env x Time 

2 



3 

9.0578 

0.9021 

6.04 

10.04 

<0.0001 

Env x Time 

3 



1 

9.5661 

0.9053 

6.13 

10.57 

<0.0001 

Env x Time 

3 



2 

9.5778 

0.9116 

6.3 

10.51 

<0.0001 

Env x Time 

3 



3 

9.5230 

0.9021 

6.04 

10.56 

<0.0001 


TABLE 28.12 

Comparisons between Clothing Types for Each Combination of Environment and Sex 

Effect 

Env 

Sex 

C/o 

Time Env 

Sex 

Clo Time 

Estimate 

Standard Errror 

df f-Value 

P* 

Env x Sex x Clo 

1 

1 

1 

— 1 

1 

2 — 

-3.9441 

0.3646 

18 -10.82 

<0.0001 

Env x Sex x Clo 

1 

2 

1 

— 1 

2 

2 — 

-6.1981 

0.3646 

18 -17.00 

<0.0001 

Env x Sex x Clo 

2 

1 

1 

— 2 

1 

2 — 

-3.2988 

0.3646 

18 -9.05 

<0.0001 

Env x Sex x Clo 

2 

2 

1 

— 2 

2 

2 — 

-7.3196 

0.3646 

18 -20.08 

<0.0001 

Env x Sex x Clo 

3 

1 

1 

— 3 

1 

2 — 

-4.8299 

0.3646 

18 -13.25 

<0.0001 

Env x Sex x Clo 

3 

2 

1 

— 3 

2 

2 — 

-7.1938 

0.3646 

18 -19.73 

<0.0001 
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TABLE 28.13 


Comparisons between Sexes for Each Combination of Environment and Clothing Type 


Effect 

Env 

Sex 

Clo 

Time Env 

Sex 

Clo Time 

Estimate 

Standard Error 

df 

t- Value 

Pt 

Env x Sex x Clo 

1 

1 

1 

— 1 

2 

1 — 

4.4811 

0.3646 

18 

12.29 

<0.0001 

Env x Sex x Clo 

1 

1 

2 

— 1 

2 

2 — 

2.2270 

0.3646 

18 

6.11 

<0.0001 

Env x Sex x Clo 

2 

1 

1 

— 2 

2 

1 — 

5.0481 

0.3646 

18 

13.85 

<0.0001 

Env x Sex x Clo 

2 

1 

2 

— 2 

2 

2 — 

1.0273 

0.3646 

18 

2.82 

0.0114 

Env x Sex x Clo 

3 

1 

1 

— 3 

2 

1 — 

4.6129 

0.3646 

18 

12.65 

<0.0001 

Env x Sex x Clo 

3 

1 

2 

— 3 

2 

2 — 

2.2490 

0.3646 

18 

6.17 

<0.0001 


TABLE 28.14 

Comparisons between Environments for Each Combination of Sex and Clothing Type 

Effect 

Env 

Sex 

C/o 

Time 

Env 

Sex 

C/o 

Time 

Estimate 

Standard Error 

df f-Value 

pt 

Env x Sex x Clo 

1 

1 

1 

— 

2 

1 

1 

— 

2.8927 

1.3176 

6.86 

2.20 

0.0649 

Env x Sex x Clo 

1 

1 

1 

— 

3 

1 

1 

— 

2.9215 

1.3176 

6.86 

2.22 

0.0629 

Env x Sex x Clo 

1 

1 

2 

— 

2 

1 

2 

— 

3.5380 

1.3176 

6.86 

2.69 

0.0319 

Env x Sex x Clo 

1 

1 

2 

— 

3 

1 

2 

— 

2.0357 

1.3176 

6.86 

1.55 

0.1671 

Env x Sex x Clo 

1 

2 

1 

— 

2 

2 

1 

— 

3.4597 

1.3176 

6.86 

2.63 

0.0347 

Env x Sex x Clo 

1 

2 

1 

— 

3 

2 

1 

— 

3.0533 

1.3176 

6.86 

2.32 

0.0543 

Env x Sex x Clo 

1 

2 

2 

— 

2 

2 

2 

— 

2.3383 

1.3176 

6.86 

1.77 

0.1201 

Env x Sex x Clo 

1 

2 

2 

— 

3 

2 

2 

— 

2.0577 

1.3176 

6.86 

1.56 

0.1632 

Env x Sex x Clo 

2 

1 

1 

— 

3 

1 

1 

— 

0.02878 

1.3176 

6.86 

0.02 

0.9832 

Env x Sex x Clo 

2 

1 

2 

— 

3 

1 

2 

— 

-1.5023 

1.3176 

6.86 ■ 

-1.14 

0.2924 

Env x Sex x Clo 

2 

2 

1 

— 

3 

2 

1 

— 

-0.4064 

1.3176 

6.86 ■ 

-0.31 

0.7669 

Env x Sex x Clo 

2 

2 

2 

— 

3 

2 

2 

— 

-0.2806 

1.3176 

6.86 ■ 

-0.21 

0.8376 


combination. Finally, an examination of Table 28.14 reveals that environments 1 and 2 
are significantly different for only the combinations where Sex = 1 and Clo = 2; and for 
Sex = 2, Clo = 1. 

If one were to look at all possible pairwise comparisons among the nine combinations of 
the Environment x Time, it would require 36 comparisons. Only 18 of these would likely be 
of interest to an experimenter. Table 28.15 contains comparisons among the environment 
means for each time (these means are averages over reps, sex, and clothing), and Table 28.16 
contains comparisons among the time means for each environment (these means are also 
averages over reps, sex, and clothing). An examination of Table 28.15 reveals that, at time 1, 
environment 1 is significantly different from both environments 2 and 3, and that environ¬ 
ments 2 and 3 are not significantly different. At times 2 and 3, none of the three environ¬ 
ments are significantly different from one another. An examination of Table 28.16 reveals 
that all time points are significantly different from one another for environments 1 and 2, 
and that no time points are significantly different from one another for environment 3. 

A plot showing the time means for each comparison is shown in Figure 28.1. This plot 
was created by the SAS commands shown in Table 28.17. An examination of the plot 
illustrates the statements made about time comparisons for Table 28.16. 
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TABLE 28.15 


Comparisons between Environments for Each Time 


Effect 

Env 

Sex 

Clo 

Time 

Env 

Sex 

Clo 

Time 

Estimate 

Standard Error 

df 

t- Value 

Pt 

Env x Time 

1 

— 

— 

1 

2 

— 

— 

1 

4.1747 

1.2803 

6.13 

3.26 

0.0167 

Env x Time 

1 

— 

— 

1 

3 

— 

— 

1 

5.5405 

1.2803 

6.13 

4.33 

0.0047 

Env x Time 

1 

— 

— 

2 

2 

— 

— 

2 

2.0496 

1.2891 

6.3 

1.59 

0.1606 

Env x Time 

1 

— 

— 

2 

3 

— 

— 

2 

-0.4714 

1.2891 

6.3 

-0.37 

0.7266 

Env x Time 

1 

— 

— 

3 

2 

— 

— 

3 

2.9472 

1.2757 

6.04 

2.31 

0.0599 

Env x Time 

1 

— 

— 

3 

3 

— 

— 

3 

2.4820 

1.2757 

6.04 

1.95 

0.0993 

Env x Time 

2 

— 

— 

1 

3 

— 

— 

1 

1.3658 

1.2803 

6.13 

1.07 

0.3263 

Env x Time 

2 

— 

— 

2 

3 

— 

— 

2 

-2.5210 

1.2891 

6.3 

-1.96 

0.0960 

Env x Time 

2 

— 

— 

3 

3 

— 

— 

3 

-0.4652 

1.2757 

6.04 

-0.36 

0.7278 


TABLE 28.16 

Comparisons between Times for Each Environment 

Effect 

Env 

Sex 

Clo Time 

Env 

Sex 

Clo 

Time 

Estimate 

Standard Error 

df 

f-Value 

pt 

Env x Time 

1 

— 

— 1 

1 

— 

— 

2 

6.0002 

0.09872 

24 

60.78 

<0.0001 

Env x Time 

1 

— 

— 1 

1 

— 

— 

3 

3.1016 

0.09673 

24 

32.06 

<0.0001 

Env x Time 

1 

— 

— 2 

1 

— 

— 

3 

-2.8986 

0.09995 

24 

-29.00 

<0.0001 

Env x Time 

2 

— 

— 1 

2 

— 

— 

2 

3.8751 

0.09872 

24 

39.25 

<0.0001 

Env x Time 

2 

— 

— 1 

2 

— 

— 

3 

1.8741 

0.09673 

24 

19.37 

<0.0001 

Env x Time 

2 

— 

— 2 

2 

— 

— 

3 

-2.0010 

0.09995 

24 

-20.02 

<0.0001 

Env x Time 

3 

— 

— 1 

3 

— 

— 

2 

-0.0117 

0.09872 

24 

-0.12 

0.9064 

Env x Time 

3 

— 

— 1 

3 

— 

— 

3 

0.04309 

0.09673 

24 

0.45 

0.6600 

Env x Time 

3 

— 

— 2 

3 

— 

— 

3 

0.05481 

0.09995 

24 

0.55 

0.5885 


Estimate 



FIGURE 28.1 Response over time for each environment. 
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TABLE 28.17 

Additional SAS Statements to Obtain Plots for Time 
Comparisons within Each Environment 

SYMBOL1 V=DOT I=JOIN COLOR=BLACK; 

SYMBOL2 V=CIRCLE I=JOIN COLOR=BLACK; 

SYMBOL3 V=TRIANGLE I=JOIN COLOR=BLACK; 

PROC GPLOT DATA=LSMS; WHERE EFFECT=' ENV*TIME' ; 
TITLE; 

PLOT ESTIMATE*TIME=ENV; 

RUN; 


28.2 Family Attitudes Experiment 

The attitudes of families from rural and urban environments were measured every six 
months for three time periods. The data were obtained from seven rural families and 
10 urban families with each family consisting of a son, a father, and a mother. The data are 
given in Table 28.18. Note that in this example there are two sets of repeated measures, the 
three time periods, and the three family members. Family member is a repeated measure 


TABLE 28.18 

Data for Family Attitude Study 






Family Member 






Son 



Father 



Mother 


Family 

T1 

T2 

T3 

T1 

T2 

T3 

T1 

T2 

T3 

Urban 

1 

17 

17 

19 

18 

19 

21 

16 

16 

18 

2 

12 

14 

15 

19 

19 

21 

16 

16 

18 

3 

8 

10 

11 

16 

18 

19 

11 

12 

12 

4 

5 

7 

7 

12 

12 

13 

13 

14 

14 

5 

2 

5 

6 

12 

14 

14 

14 

16 

18 

6 

9 

11 

11 

16 

17 

18 

14 

15 

16 

7 

8 

9 

9 

19 

20 

20 

15 

16 

18 

8 

13 

14 

16 

16 

17 

18 

18 

18 

20 

9 

11 

12 

13 

13 

16 

17 

7 

8 

10 

10 

19 

20 

20 

13 

15 

15 

11 

12 

12 

Rural 

1 

12 

11 

14 

18 

19 

22 

16 

16 

19 

2 

13 

13 

17 

18 

19 

22 

16 

16 

19 

3 

12 

13 

16 

19 

18 

22 

17 

16 

20 

4 

18 

18 

21 

23 

23 

26 

23 

22 

26 

5 

15 

14 

16 

15 

15 

19 

17 

17 

20 

6 

6 

6 

10 

15 

16 

19 

18 

19 

21 

7 

16 

17 

18 

17 

17 

21 

18 

20 

23 
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since one cannot randomly assign father, mother, and son to the three family members. 
This experiment can be described as a one-way experiment in a completely randomized 
design with two sets of repeated measures, family member, and time nested within 
family member. 

The treatment structure for this experiment is a three-way treatment structure with 
factors area ( A: rural vs urban), family member (M: son, father, mother), and time (T: 1, 2, 
3). The design structure is a completely randomized design with two sets of repeated 
measures. One model that might represent this experiment is given by 

Vijkt = p + A t + T) u + Mj + ( AM)jj + 8 ij( + T k + ( AT) ik + ( MT) jk + ( AMT) ijk + e jjkf (28.3) 

i = l,2; j = 1,2,3; k= 1,2,3 

Under split-plot-in-time assumptions on both sets of repeated measures one would 
have r\ u denoting a random family effect with the assumption that r) i( ~ i.i.d. N( 0, of), 
d ijt denoting a random person effect with the assumption that 8 ijke - N( 0, a 2 ), and e ijkmt 
denoting the random measurement error for a given time point with the assumption that 
e ijkf ~ i.i.d. N( 0, of). The SAS-Mixed commands to perform an analysis under the above 
assumptions are shown in Table 28.19. However, it is up to the readers to perform this 
analysis should they wish to do so. The analysis using the commands in Table 28.19 would 
be similar to the analysis from a split-split-plot experiment. See Example 24.5 for such 
an example. 


TABLE 28.19 

SAS Commands to Analyze the Date in Table 28.18 Assuming That Both 
Sets of Repeated Measures Satisfy Split-Plot-In-Time Assumptions 

DATA ATTITUDEES; 

INPUT A $ F ATTITUDE1-ATTITUDE9; 

IF A='RURAL' THEN F=F+1 0 ; 

CARDS; 


URBAN 

1 

17 

17 

19 

18 

19 

21 

16 

16 

18 

URBAN 

2 

12 

14 

15 

19 

19 

21 

16 

16 

18 

URBAN 

3 

8 

10 

11 

16 

18 

19 

11 

12 

12 

URBAN 

4 

5 

7 

7 

12 

12 

13 

13 

14 

14 

URBAN 

5 

2 

5 

6 

12 

14 

14 

14 

16 

18 

URBAN 

6 

9 

11 

11 

16 

17 

18 

14 

15 

16 

URBAN 

7 

8 

9 

9 

19 

20 

20 

15 

16 

18 

URBAN 

8 

13 

14 

16 

16 

17 

18 

18 

18 

20 

URBAN 

9 

11 

12 

13 

13 

16 

17 

7 

8 

10 

URBAN 

10 

19 

20 

20 

13 

15 

15 

11 

12 

12 

RURAL 

1 

12 

11 

14 

18 

19 

22 

16 

16 

19 

RURAL 

2 

13 

13 

17 

16 

15 

19 

19 

19 

23 

RURAL 

3 

12 

13 

16 

19 

18 

22 

17 

16 

20 

RURAL 

4 

18 

18 

21 

23 

23 

26 

23 

22 

26 

RURAL 

5 

15 

14 

16 

15 

15 

19 

17 

17 

20 

RURAL 

6 

6 

6 

10 

15 

16 

19 

18 

19 

21 

RURAL 

7 

16 

17 

18 

17 

17 

21 

18 

20 

23 


PROC PRI NT; 
RUN; 


Continued 
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TABLE 28.19 (continued) 

DATA USUAL; SET ATTITUDEES; DROP ATTITUDEl-ATTITUDE9 ; 


M='S ' ; 

T=0 ; 

ATTITUDE=ATTITUDE1; 

OUTPUT 

M='S ' ; 

T=6 ; 

ATTITUDE=ATTITUDE2; 

OUTPUT 

M='S ' ; 

T=1 2 ; 

ATTITUDE=ATTITUDE3; 

OUTPUT 

M='F ' ; 

T=0 ; 

ATTITUDE=ATTITUDE4; 

OUTPUT 

M='F ' ; 

T=6 ; 

ATTITUDE=ATTITUDE5; 

OUTPUT 

M='F'; 

T—1 2 ; 

ATTITUDE=ATTITUDE 6; 

OUTPUT 

M='M'; 

T=0 ; 

ATTITUDE=ATTITUDE 7; 

OUTPUT 

M='M'; 

T=6 ; 

ATTITUDE=ATTITUDE 8; 

OUTPUT 

M='M'; 

T=1 2 ; 

ATTITUDE=ATTITUDE 9; 

OUTPUT 


RUN; 

PROC MIXED DATA=USUAL; 

TITLE 'ANALYSIS ASSUMING SPLIT-PLOT-IN-TIME ASSUMPTIONS'; 
TITLE2 'FOR BOTH SETS OF REPEATED MEASURES'; 

CLASSES A M T F; 

MODEL ATTITUDE=A|M|T/DDFM=BETWITHIN; 

RANDOM F (A) M*F (A) ; 

LSMEANS A*T/PDIFF; 

RUN; 


The analyses considered in this section will consider the model 

Vijkt = p + Aj + Mj + + T k + ( AT) ik + ( MT) jk + ( AMT) ijk + e ijU (28.4) 

i= 1,2; j = 1,2,3; k= 1,2,3 


Let 


£ mt 

£ il2t 

£ H3( 

E iZit 

£ i22t 

£ mt 

£ me 

, £ i33( 


be the vector of errors for the fth family in the ith area. 

Assume that e u ~ N( 0, X), and the statistical analysis will depend upon the structure 
assumed for X. The covariance matrix X is a 9 x 9 matrix. Such a set-up ignores the nesting 
structure of the repeated measures and simply assumes that there are nine repeated 
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measures, and all of the structures that were considered in Chapter 27 would be possible 
here. Structures that take into account the nesting of the repeated measures assume that 


X = Var 


£ i\U 

£ H2l 

£ H3( 

£ mt 

£ i22f 

£ i23t 

£ i31t 

£ i32l 


e u v 

e 12 v 

o l3 v 

e 21 v 

e 22 v 

e 23 v 

o 31 v 

e 32 v 



0 ® V (say) 


where 0 ® V represents the direct product of the two variance covariance matrices 


~o n 

012 



Hi 

Hz 

H 3 

0 21 

022 

@23 

and V = 

Hi 

v 22 

Ha 

_@31 

@32 

@33j 


Hi 

V 32 

H3_ 


The SAS-Mixed procedure will allow the user to select such a covariance structure when 
0is unstructured and V has either compound symmetry, AR(1), or unstructured. Analyzing 
the data as a split-split plot experiment is equivalent to assuming that both 0and V satisfy 
the compound symmetry structure. 

The data in Table 28.18 were analyzed under four different covariance structures with 
the first three being obtained by taking the direct product of a covariance matrix corre¬ 
sponding to family member and a covariance matrix corresponding to time. The structure 
considered for family member was unstructured, and the structures considered for time 
were compound symmetry, AR(1), and unstructured. The fourth covariance structure con¬ 
sidered was an unstructured 9x9 covariance matrix that treats all combinations of family 
member and time as nine repeated measures. The resulting fit statistics are examined in 
order to determine the best covariance structure to consider when comparing the fixed 
effects of area, family member, and time and all possible interactions of these effects. The 
SAS-Mixed commands used are given in Table 28.20 and continued in Table 28.21. It might 
be noted that the option DDFM = BETWITHIN was used in all analyses rather than the 
DDFM = KR option. For the fit statistics that are obtained, it does not matter which of these 
two options are used. However, if one is going to consider tests on the fixed effects under 
a direct product covariance structure, using the DDFM = KR fails to give reasonable results 
since the 9x9 estimated covariance matrix is singular under the direct product structures. 
The results for the fit statistics are given in Table 28.22. In the two cases where V was 
assumed to have compound symmetry or AR(1), the Mixed routine did not produce any 
results because of obtaining an infinite likelihood value. This seems to have happened 
because of the default starting values for the covariance parameters that the Mixed proce¬ 
dure chose. To force the Mixed procedure to begin with different starting values, a PARMS 
option was included. The statement 

PARMS 10, 1, 10, 1, 1, 10, .5; 
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TABLE 28.20 

SAS Commands Used to Compare Covariance Structures 

PROC MIXED DATA=USUAL MAXITER=500; 

TITLE 'ANALYSIS ASSUMING TYPE=UN@UN' ; 

CLASSES A M T F; 

MODEL ATTITUDE=A|M|T/DDFM=BETWITHIN; 

REPEATED M T/TYPE=UN@UN R SUBJECT=F; 

ODS LISTING EXCLUDE ALL; 

ODS OUTPUT FITSTATISTICS=FIT1; 

RUN; 

DATA FITl; SET FIT1; TYPE='UN@UN 

RUN; 

PROC MIXED DATA=USUAL MAXITER=500; 

TITLE 'ANALYSIS ASSUMING TYPE=UN@CS' ; 

CLASSES A M T F; 

MODEL ATTITUDE=A | M | T/DDFM=BETWITHIN; 

REPEATED M T/TYPE=UN@CS R SUBJECT=F; 

PARMS 10, 1, 10, 1, 1, 10, .5; 

ODS LISTING EXCLUDE ALL; 

ODS OUTPUT FITSTATISTICS=FIT2; 

RUN; 

DATA FIT2; SET FIT2; TYPE='UN@CS'; 

RUN; 

PROC MIXED DATA=USUAL MAXITER=500; 

TITLE 'ANALYSIS ASSUMING TYPE=UN@AR (1) ' ; 

CLASSES A M T F; 

MODEL ATTITUDE=A | M | T/DDFM=BETWITHIN; 

REPEATED M T/TYPE=UN@AR (1) R SUBJECT=F; 

PARMS 10, 1, 10, 1, 1, 10, .5; 

ODS LISTING EXCLUDE ALL; 

ODS OUTPUT FITSTATISTICS=FIT3; 

RUN; 

DATA FIT3; SET FIT3; TYPE='UN@AR (1)'; 

RUN; 

PROC MIXED DATA=USUAL MAXITER=500; 

TITLE 'ANALYSIS ASSUMING TYPE=UN 9X9 COVARIANCE MATRIX'; 
CLASSES A M T F; 

MODEL ATTITUDE=A|M|T/DDFM=KR; 

REPEATED M*T/TYPE=UN R SUBJECT=F; 

ODS LISTING EXCLUDE ALL; 

ODS OUTPUT FITSTATISTICS=FIT4; 

RUN; 

DATA FIT4; SET FIT4; TYPE='UN 9X9 '; 

RUN; 


selects starting values for the parameters in 



A 

0 12 

®13 


'1 

P 

P 

0 = 

$21 

022 

^23 

and V = 

P 

1 

P 


Ai 

^32 

^33 J 


_P 

P 

1 
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TABLE 28.21 

SAS Commands Used to Compare Covariance Structures (Continued) 

DATA FITSTATS; SET FIT1 FIT2 FIT3 FIT4 ; 

PROC SORT; by descr,- 

ODS RTF FILE='C:\TEMP1.RTF'; 

PROC PRI NT DATA= FITSTATS; 

TITLE 'FIT STATISTICS'; 

ODS LISTING SELECT ALL; 

ODS RTF SELECT ALL; 

RUN; 

ODS RTF CLOSE; 


as 9 n = 10, 0 2l = 1, 9 22 = 10, 9 3] = 1 ,9 32 = 1, 9 33 = 10, and p = 0.5. Without loss of generality the 
diagonal elements in V can be taken to be equal to 1. The same starting values were also 
used for the case when V has AR(1) structure. In this case. 


V = 


1 

P 


P P 2 
1 P 
P 1 


and again without loss of generality, the diagonal elements in V can be taken as equal to 1. 

An examination of the AIC, AICC, and BIC values in Table 28.22 reveals that all three of 
these criteria achieve their respective minimums for the covariance structure 0®Vwhere 


TABLE 28.22 

Fit Statistics for Family Attitudes Data 

Description Value Type 


-2 Res Log Likelihood 
-2 Res Log Likelihood 
-2 Res Log Likelihood 
-2 Res Log Likelihood 
AIC (smaller is better) 
AIC (smaller is better) 
AIC (smaller is better) 
AIC (smaller is better) 
AICC (smaller is better) 
AICC (smaller is better) 
AICC (smaller is better) 
AICC (smaller is better) 
BIC (smaller is better) 
BIC (smaller is better) 
BIC (smaller is better) 
BIC (smaller is better) 


480.2 

UN@UN 

482.6 

UN@CS 

487.8 

UN@AR(1) 

433.0 

UN 9x9 

502.2 

UN@UN 

496.6 

UN@CS 

501.8 

UN@AR(1) 

523.0 

UN 9x9 

504.3 

UN@UN 

497.5 

UN@CS 

502.7 

UN@AR(1) 

569.5 

UN 9x9 

511.4 

UN@UN 

502.5 

UN@CS 

507.6 

UN@AR(1) 

560.5 

UN 9x9 
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0 is unstructured and V has compound symmetry. This is the assumption that is used to 
consider inferences on the fixed effect parameters. 

To obtain tests of the fixed effects the SAS-Mixed commands are given in Table 28.23. 
The data were sorted prior to this analysis since the SAS-Mixed procedure sorts 
the variables in the Classes statement alphabetically. Sorting the data helps to match the 
covariance parameter estimates with the appropriate family member. The estimates of 
the covariance parameters are shown in Table 28.24. Note that 



1_ 

@FM 

C/1 

_l 


"8.1570 

2.7308 

1.2415" 

0 = 

@MF 

&MM 

®MS 

= 

2.7308 

9.7410 

2.6035 


@SF 

M 

@ss \ 


1.2415 

2.6035 

14.7547 


0.9637 


The subscripts in <9 have been changed to correspond to father (F), mother (M), and son (S). 
The resulting estimate of the 9x9 covariance matrix 0 <8> V is given in Table 28.25 and the 
tests on the fixed effects are shown in Table 28.26. 


TABLE 28.23 

SAS Commands Used to Obtain the Tests of Fixed 
Effects Table and Covariance Parameter Estimates 

PROC SORT DATA=USUAL; BY F A M T; 

RUN; 

PROC MIXED DATA=USUAL MAXITER=500; 

TITLE 'ANALYSIS ASSUMING TYPE=UN@CS' ; 
CLASSES AM T F; 

MODEL ATTITUDE=A|M|T/DDFM=BETWITHIN; 
REPEATED M T/TYPE=UN@CS R SUBJECT=F; 

PARMS 10, 1, 10, 1, 1, 10, .5; 

ODS RTF SELECT ALL; 

RUN; 


TABLE 28.24 

Covariance Parameter Estimates 


Covariance Parameter Estimates 

Covariance Parameter 

Subject 

Estimate 

M UN(1,1) 

F 

8.1570 

UN(2,1) 

F 

2.7308 

UN(2,2) 

F 

9.7410 

UN(3,1) 

F 

1.2415 

UN(3,2) 

F 

2.6035 

UN(3,3) 

F 

14.7547 

T Corr 

F 

0.9637 
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TABLE 28.25 

Estimated Covariance Matrix for Each Family 


Estimated R Matrix for F 1 

Row Coll Col2 Col3 

Col4 

Col5 

Col6 

Col7 

Col8 

Col9 

1 

8.1570 

7.8606 

7.8606 

2.7308 

2.6315 

2.6315 

1.2415 

1.1964 

1.1964 

2 

7.8606 

8.1570 

7.8606 

2.6315 

2.7308 

2.6315 

1.1964 

1.2415 

1.1964 

3 

7.8606 

7.8606 

8.1570 

2.6315 

2.6315 

2.7308 

1.1964 

1.1964 

1.2415 

4 

2.7308 

2.6315 

2.6315 

9.7410 

9.3870 

9.3870 

2.6035 

2.5089 

2.5089 

5 

2.6315 

2.7308 

2.6315 

9.3870 

9.7410 

9.3870 

2.5089 

2.6035 

2.5089 

6 

2.6315 

2.6315 

2.7308 

9.3870 

9.3870 

9.7410 

2.5089 

2.5089 

2.6035 

7 

1.2415 

1.1964 

1.1964 

2.6035 

2.5089 

2.5089 

14.7547 

14.2186 

14.2186 

8 

1.1964 

1.2415 

1.1964 

2.5089 

2.6035 

2.5089 

14.2186 

14.7547 

14.2186 

9 

1.1964 

1.1964 

1.2415 

2.5089 

2.5089 

2.6035 

14.2186 

14.2186 

14.7547 


TABLE 28.26 

Tests of Fixed Effects 
Type III Tests of Fixed Effects 


Effect 

Num df 

Den df 

F-Value 

Pr>F 

A 

1 

15 

8.55 

0.0105 

M 

2 

30 

10.12 

0.0004 

AxM 

2 

30 

1.56 

0.2271 

T 

2 

30 

184.11 

<0.0001 

AxT 

2 

30 

27.89 

<0.0001 

MxT 

4 

60 

0.80 

0.5271 

AxMxT 

4 

60 

1.23 

0.3080 


An examination of Table 28.26 reveals that A, M, T, and AxT are statistically significant. 
To explore these effects, one should look at the main effect means for family member (M) 
and the two-way means for area by time (A x T). 

The SAS-Mixed commands to compute the corresponding least squares means and to 
make pairwise comparisons among them are shown in Table 28.27. These commands 
should be appended to those in Table 28.23. The least squares means for M and AxT are 
shown in Tables 28.28 and 28.29, respectively Interesting pairwise comparisons among 
these means are given in Tables 28.30-28.32. An examination of Tables 28.28 and 28.30 
reveals that the son's mean is significantly smaller than both of the parent means, and that 
the father and mother means are not significantly different from one another. 

An examination of Table 28.29 suggests that means are increasing over time for both 
rural families and urban families. However, an examination of Table 28.31 reveals that the 
0 and 6 month means for rural families are not significantly different, but both are signifi¬ 
cantly smaller than the 12 month mean. For urban families all time means are significantly 
different from one another. An examination of Table 28.32 reveals that urban families 
have means that are significantly smaller than those of rural families at 0 and 12 months. 
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TABLE 28.27 

Additional SAS Commands Used to Obtain Least Squares 
Means and Pairwise Comparison among Them 

LSMEANS M A*T/PDIFF; 

ODS RTF EXCLUDE ALL; 

ODS LISTING EXCLUDE ALL; 

ODS OUTPUT LSMEANS = LSMS DIFFS=LSMDIFFS ; 

RUN; 

PROC PRINT DATA=LSMS; WHERE EFFECT='M'; 

ODS RTF SELECT ALL; 

TITLE 'FAMILY MEMBER LEAST SQUARES MEANS'; 

RUN; 


PROC PRINT DATA=LSMS; WHERE EFFECT=’A*T'; 
TITLE 'AREA X TIME LEAST SQUARES MEANS' ; 

RUN; 

PROC PRINT DATA=LSMDIFFS; WHERE EFFECT='M'; 
TITLE4 'FAMILY MEMBER DIFFERENCES'; 

RUN; 


PROC PRINT DATA=LSMDI FFS; WHERE EFFECT='A*T' AND A=_A; 
TITLE4 'TIME DIFFERENCES FOR EACH AREA'; 

RUN; 


PROC PRINT DATA=LSMDIFFS; WHERE EFFECT='A*T' AND T=_T; 
TITLE4 'AREA DIFFERENCES FOR EACH TIME POINT'; 

RUN; 


ODS RTF CLOSE; 

QUIT; 


TABLE 28.28 


Family Member Least Squares Means 


Effect 

A M 

T 

Estimate 

Standard Error 

df 

f-Value 

Pt 

M 

F 

— 

17.6643 

0.6952 

30 

25.41 

<0.0001 

M 

M 

— 

16.9714 

0.7597 

30 

22.34 

<0.0001 

M 

S 

— 

12.8810 

0.9349 

30 

13.78 

<0.0001 


TABLE 28.29 

Two-Way Least Squares Mean for Area and Time Combinations 

Effect 

A M 

T 

Estimate 

Standard Error 

df 

f-Value 

Pt 

AxT 

Rural 

0 

16.3333 

0.8527 

30 

19.16 

<0.0001 

AxT 

Rural 

6 

16.3810 

0.8527 

30 

19.21 

<0.0001 

AxT 

Rural 

12 

19.6190 

0.8527 

30 

23.01 

<0.0001 

AxT 

Urban 

0 

13.1000 

0.7134 

30 

18.36 

<0.0001 

AxT 

Urban 

6 

14.3000 

0.7134 

30 

20.04 

<0.0001 
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TABLE 28.30 


Family Members Comparisons 


Effect 

A M T A 

M 

T Estimate 

Standard Error 

df 

f-Value 

P* 

M 

F — 

M 

— 0.6929 

0.8584 

30 

0.81 

0.4259 

M 

F — 

S 

— 4.7833 

1.1001 

30 

4.35 

0.0001 

M 

M — 

S 

— 4.0905 

1.0690 

30 

3.83 

0.0006 


TABLE 28.31 


Time Comparisons within Each Area 


Effect 

A 

M T 

A M 

T 


Estimate 

Standard Error 

df 

t- Value 

Pt 

AxT 

Rural 

0 

Rural 

6 


-0.04762 

0.2299 

30 

-0.21 

0.8373 

AxT 

Rural 

0 

Rural 

12 


-3.2857 

0.2299 

30 

-14.29 

<0.0001 

AxT 

Rural 

6 

Rural 

12 


-3.2381 

0.2299 

30 

-14.09 

<0.0001 

AxT 

Urban 

0 

Urban 

6 


-1.2000 

0.1923 

30 

-6.24 

<0.0001 

AxT 

Urban 

0 

Urban 

12 


-2.2000 

0.1923 

30 

-11.44 

<0.0001 

AxT 

Urban 

6 

Urban 

12 


- 1.0000 

0.1923 

30 

-5.20 

<0.0001 

TABLE 28.32 










Area Comparisons within Each Time Level 








Effect 

A 

M T 

A M 


T 

Estimate 

Standard Error 

df 

t- Value 

pt 

AxT 

Rural 

0 

Urban 


0 

3.2333 

1.1118 

30 

2.91 

0.0068 

AxT 

Rural 

6 

Urban 


6 

2.0810 

1.1118 

30 

1.87 

0.0710 

AxT 

Rural 

12 

Urban 


12 

4.3190 

1.1118 

30 

3.88 

0.0005 

The difference between the rural family and urban family means at 6 months is not large 

enough to achieve significance at the 0.05 level. 






28.3 Multilocation Experiment 

This section considers an experiment involving three drugs and where each subject was 
measured repeatedly at three different time points. The data were collected by three 
different investigators (or in three different centers). The data are shown in Table 28.33. 
Note that there are quite a few missing data points, as indicated by the empty cells. 

The analysis assumes that centers and subjects are random effects, and that drugs and 
times are fixed effects. If one assumes that the repeated measures satisfy the split-plot-in- 
time assumptions, the model for the data in Table 28.33 is 


Vijkt - b + 'I + Dj + 8 m + T k + (DT)j k + e ijk , 


(28.5) 
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TABLE 28.33 


Multicenter Drug Experiment 


Center 

Drug Subject 

Y1 

Y2 

Y3 

Center 

Drug 

Subject 

Y1 

Y2 

Y3 

Center Drug Subject Y1 

Y2 

Y3 

R 

1 

1 

17 



R 

2 

1 

18 

19 

21 

R 

3 

1 

16 

16 

18 

R 

1 

2 

12 

14 

15 

R 

2 

2 

19 



R 

3 

2 

16 

16 

18 

R 

1 

3 

12 

11 

14 

R 

2 

3 

18 

19 


R 

3 

3 

16 

16 

19 

R 

1 

4 

13 

13 

17 

R 

2 

4 

16 

15 

19 

R 

3 

4 

19 

19 

23 

R 

1 

5 

12 

13 


R 

2 

5 

19 

18 


R 

3 

5 

17 

16 

20 

S 

1 

1 

18 

18 

21 

S 

2 

1 

23 

23 

26 

S 

3 

1 

23 



S 

1 

2 

15 

14 

16 

S 

2 

2 

15 

15 

19 

S 

3 

2 

17 

17 

20 

s 

1 

3 

6 

6 


s 

2 

3 

15 

16 

19 

s 

3 

3 

18 

19 

21 

s 

1 

4 

16 

17 

18 

s 

2 

4 

17 

17 

21 

s 

3 

4 

18 



T 

1 

1 

8 

10 

11 

T 

2 

1 

16 

18 


T 

3 

1 

11 



T 

1 

2 

5 

7 

7 

T 

2 

2 

12 



T 

3 

2 

13 

14 

14 

T 

1 

3 

2 

5 

6 

T 

2 

3 

12 



T 

3 

3 

14 

16 

18 

T 

1 

4 

9 

11 

11 

T 

2 

4 

16 

17 

18 

T 

3 

4 

14 

15 

16 

T 

1 

5 

8 

9 

9 

T 

2 

5 

19 

20 

20 

T 

3 

5 

15 

16 

18 

T 

1 

6 

13 

14 

16 

T 

2 

6 

16 



T 

3 

6 

18 

18 

20 

T 

1 

7 

11 

12 


T 

2 

7 

13 

16 

17 

T 

3 

7 

7 



T 

1 8 

19 

20 


T 

2 

8 

13 

15 


T 

3 

8 

11 




where D ( denotes the/th drug, T k denotes the /cth time effect, and (DT) (/ denotes the inter¬ 
action effect between time and drug. In addition, p, denotes a random center effect with 
the assumption that p t ~ i.i.d. N( 0, of), d, ;/ denotes a random subject effect for the fth subject 
assigned the /th drug in the zth center. It is assumed that 8 ljt ~ N( 0, of). Finally, e ijk( denotes 
the random error associated with the /cth time for the fth subject assigned the /th drug 
in the /th center with the assumption that e i]k , ~ i.i.d. N( 0 , cr ( 2 ). It is also assumed that all 
random effects are independently distributed. 

If the split-plot-in-time assumption is not appropriate, then the model would be 


y,ju - R + li + +T k + (DT) jk + e ljk , 
where it is assumed that 77 , - i.i.d. N( 0, of) and 



1 

JTO 

_ 1 


( 

1 

O 
_ 1 


b 

1_ 

°12 

^13 1 

\ 

II 

£ ij2t 

~n 3 


0 

/ 

C7 21 

o 22 

°23 



£ ij3£ \ 


V 

0_ 


_&31 

C 32 

°33 J 

y 


- N 3 (0,X) (say) 


(28.6) 


and the analysis would depend on the structure assumed for X. Note that the model in 
Equation 28.6 is the same as the model in (28.1) except that the random effect correspond¬ 
ing to a subject, <5,^, has been removed from the model. One might also want to include 
person as a random effect for some X such as when X has an AR(1) structure. 

Four covariance structures for the repeated measures were considered: unstructured, 
compound symmetry, AR(1), and AR(1) plus a random subject effect. The BIC criterion was 
smallest for the unstructured assumption while AIC and AICC were both smallest for the 
AR(1) plus a random subject effect structure. For this latter structure the estimate of p was 
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TABLE 28.34 

SAS Commands Used to Analyze the Data in Table 28.33 

TITLE 'Ex. 28.3 A MULTI-CENTER TRIAL'; 

DATA ONEB; 

INPUT CENTER $ DRUG SUBJ Y1-Y3 @@; 

CARDS; 


R 

1 

1 

17 



R 

2 

1 

18 

19 

21 

R 

3 

1 

16 

16 

18 

R 

1 

2 

12 

14 

15 

R 

2 

2 

19 



R 

3 

2 

16 

16 

18 

R 

1 

3 

12 

11 

14 

R 

2 

3 

18 

19 


R 

3 

3 

16 

16 

19 

R 

1 

4 

13 

13 

17 

R 

2 

4 

16 

15 

19 

R 

3 

4 

19 

19 

23 

R 

1 

5 

12 

13 


R 

2 

5 

19 

18 


R 

3 

5 

17 

16 

20 

S 

1 

1 

18 

18 

21 

S 

2 

1 

23 

23 

26 

S 

3 

1 

23 



S 

1 

2 

15 

14 

16 

S 

2 

2 

15 

15 

19 

S 

3 

2 

17 

17 

20 

S 

1 

3 

6 

6 


S 

2 

3 

15 

16 

19 

S 

3 

3 

18 

19 

21 

S 

1 

4 

16 

17 

18 

S 

2 

4 

17 

17 

21 

S 

3 

4 

18 



T 

1 

1 

8 

10 

11 

T 

2 

1 

16 

18 


T 

3 

1 

11 



T 

1 

2 

5 

7 

7 

T 

2 

2 

12 



T 

3 

2 

13 

14 

14 

T 

1 

3 

2 

5 

6 

T 

2 

3 

12 



T 

3 

3 

14 

16 

18 

T 

1 

4 

9 

11 

11 

T 

2 

4 

16 

17 

18 

T 

3 

4 

14 

15 

16 

T 

1 

5 

8 

9 

9 

T 

2 

5 

19 

20 

20 

T 

3 

5 

15 

16 

18 

T 

1 

6 

13 

14 

16 

T 

2 

6 

16 



T 

3 

6 

18 

18 

20 

T 

1 

7 

11 

12 


T 

2 

7 

13 

16 

17 

T 

3 

7 

7 



T 

1 

8 

19 

20 


T 

2 

8 

13 

15 


T 

3 

8 

11 




RUN; 


DATA TWO; SET ONE; DROP Y1-Y3; 

IF CENTER='R ' THEN SUBJ=100+10*DRUG+SUBJ 
IF CENTER='S ' THEN SUBJ=200+10*DRUG+SUBJ 
IF CENTER='T ' THEN SUBJ=300+10*DRUG+SUBJ 


TIME=1; Y=Y1; OUTPUT; 
TIME=2; Y=Y2; OUTPUT; 
TIME=3; Y=Y3; OUTPUT; 

RUN; 

ODS RTF FILE='C:\TEMP.RTF'; 


P R 0 C MIXED DATA=TWO; 

TITLE2 'ANALYSIS USING MIXED - ASSUMING AN UNSTRUCTURED'; 
TITLE3 'COVARIANCE MATRIX FOR THE REPEATED MEASURES'; 
CLASSES CENTER SUBJ DRUG TIME; 

MODEL Y=DRUG | TIME/DDFM = SATTERTH; 

LSMEANS DRUG|TIME/PDIFF; 

ODS LISTING EXCLUDE LSMEANS DIFFS; 

ODS OUTPUT LSMEANS=LSMS DIFFS=PDIFFS; 

ODS RTF EXCLUDE LSMEANS DIFFS; 

RANDOM CENTER; 

REPEATED TIME/SUBJECT=SUBJ TYPE=UN R=2; 

RUN; 


PROC PRI NT 
PROC PRI NT 
PROC PRI NT 
PROC PRI NT 
RUN; 


DATA=LSMS; WHERE EFFECT='DRUG'; 
DATA=PDIFFS; WHERE EFFECT='DRUG'; 
DATA=LSMS; WHERE EFFECT='TIME'; 
DATA=PDIFFS; WHERE EFFECT='TIME'; 


ODS RTF CLOSE; 
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-0.3141, which does not seem to make sense from a philosophical point of view as this 
would mean that the correlation between the first and second time points is negative, but 
the correlation between the first and third time points is positive, which does not seem 
reasonable. Consequently, inferences on the fixed effects will be obtained by assuming the 
repeated measures have an unstructured covariance structure. 

The SAS-Mixed commands used to obtain inferences on the fixed effects are given in 
Table 28.34. The estimates of the covariance parameters are shown in Table 28.35, and 
hypothesis tests for the fixed effects are shown in Table 28.36. An examination of Table 
28.36 reveals that the Drug and Time main effects are both significant and the Drug x Time 
interaction effect is not significant. Consequently, comparisons can be made among the 
drug and time main effect means. Table 28.37 gives the drug main effect means and Table 
28.38 gives pairwise comparisons among the drug main effect means. An examination of 
these two tables reveals that the drug 1 mean is significantly smaller than both the drug 2 


TABLE 28.35 

Estimates of Covariance Parameters 


Covariance Parameter Estimates 

Covariance Parameter 

Subject 

Estimate 

CENTER 


4.1996 

UN(1,1) 

SUBJ 

10.9733 

UN(2,1) 

SUBJ 

10.1984 

UN(2,2) 

SUBJ 

10.5365 

UN(3,1) 

SUBJ 

9.9781 

UN(3,2) 

SUBJ 

9.3742 

UN(3,3) 

SUBJ 

9.9515 


TABLE 28.36 

Hypothesis Tests of Fixed Effects 

Type III Tests of Fixed Effects 




Effect 

Num df 

Den df 

F-Value 

Pr>F 

Drug 

2 

44.2 

11.35 

0.0001 

Time 

2 

34.6 

134.49 

<0.0001 

Drug x Time 

4 

34.4 

1.04 

0.4022 


TABLE 28.37 

Drug Least Squares Means 

Effect 

Drug 

Time Estimate 

Standard Error 

df 

t- Value 

ft 

Drug 

1 

— 13.1182 

1.4176 

2.44 

9.25 

0.0059 

Drug 

2 

— 18.0637 

1.4187 

2.45 

12.73 

0.0027 

Drug 

3 

— 16.9923 

1.4186 

2.45 

11.98 

0.0031 
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TABLE 28.38 


Pairwise Comparison between Drug Main Effect Means 


Effect 

Drug 

Time 

Drug 

Time 

■ Estimate 

Standard Error df t- Value 

Pt 

Drug 


1 

— 

2 

— 

-4.9454 

1.0929 

44.1 -4.53 

<0.0001 

Drug 


1 

— 

3 

— 


-3.8740 

1.0921 

44 

-3.55 

0.0009 

Drug 


2 

— 

3 

— 


1.0714 

1.0943 

44.4 

0.98 

0.3328 

TABLE 28.39 











Time Least Squares Means 









Effect 


Drug 

Time 

Estimate 

Standard Error 

df f-Value 

Pt 

Time 


— 


1 

14.8982 

1.2767 

1.6 11.67 

0.0153 

Time 


— 


2 

15.6157 

1.2753 

1.57 12.24 

0.0152 

Time 


— 


3 

17.6602 

1.2722 

1.59 13.88 

0.0119 

TABLE 28.40 











Pairwise Comparisons between Time Main Effect Means 





Effect 

Drug 

Time 

Drug 

Time 

Estimate 

Standard Error df t- Value 

P* 

Time 


— 

1 

— 

2 

-i 

0.7175 

0.1651 

38.4 

-4.34 

<0.0001 

Time 


— 

1 

— 

3 

- 

2.7619 

0.1698 

30.8 

■16.26 

<0.0001 

Time 


— 

2 

— 

3 

- 

2.0445 

0.2199 

33.8 

-9.30 

<0.0001 

TABLE 28.41 











Data from a Soil Study Experiment 








IRR 1 




REP1 






REP2 




Dur 1 



Dur 2 



Dur 1 


Dur 2 


Depth 

Site A 

Site B 

SiteC 

Site A 

Site B 

Site C Depth 

Site A Site B 

Site C Site A 

Site B 

SiteC 

1 

5.91 

5.98 

5.94 

6.73 

6.44 

6.22 

1 

5.94 5.83 

5.91 6.35 

6.16 

6.26 

2 

5.94 

6.05 

6.18 

6.82 

6.84 

6.64 

2 

6.10 5.96 

5.97 6.42 

6.52 

6.20 

3 

7.40 

7.31 

7.44 

8.12 

7.89 

7.96 

3 

7.55 7.15 

7.13 7.96 

7.63 

7.34 

4 

7.14 

7.14 

7.20 

7.84 

7.87 

7.77 

4 

7.03 7.60 

7.56 7.16 

7.38 

6.83 

5 

7.62 

7.62 

7.67 

8.56 

8.54 

8.65 

5 

7.59 7.60 

7.59 8.62 

8.61 

8.45 

6 

7.61 

7.63 

7.62 

8.60 

8.59 

8.29 

6 

7.59 7.61 

7.51 8.65 

8.51 

8.49 

IRR 2 




REP1 






REP2 




Dur 1 



Dur 2 



Dur 1 


Dur 2 


Depth 

Site A 

Site B 

SiteC 

Site A 

Site B 

Site C Depth 

Site A Site B 

Site C Site A 

Site B 

SiteC 

1 

5.81 

5.86 

5.59 

6.62 

6.48 

6.14 

1 

5.71 5.59 

5.19 6.61 

6.51 

6.00 

2 

6.06 

5.90 

5.84 

6.24 

6.28 

6.25 

2 

5.69 5.46 

5.30 6.31 

6.39 

6.32 

3 

6.98 

6.76 

7.00 

8.47 

8.33 

8.63 

3 

7.38 7.73 

7.67 7.81 

7.78 

7.80 

4 

6.19 

6.14 

6.78 

7.77 

7.66 

8.05 

4 

6.99 7.03 

7.42 7.63 

7.46 

7.09 

5 

7.42 

7.48 

7.55 

8.40 

8.36 

8.34 

5 

7.62 6.75 

7.55 8.19 

8.46 

8.50 

6 

7.54 

7.55 

7.31 

8.54 

8.46 

8.44 

6 

7.53 7.54 

7.54 8.46 

8.59 

8.66 
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mean and the drug 3 mean. In addition, there is no significant difference between the 
drug 2 and drug 3 means. Table 28.39 gives the time main effect means and Table 28.40 
gives pairwise comparisons between the time means. The time means increase over time, 
and all pairwise comparisons are statistically significant. 


28.4 Exercises 

28.1 An experiment consists of eight columns of soil in tubes that are 20 cm in diam¬ 
eter and 100 cm long. Four of the columns were assigned to a type of irrigation 
(Irrl and Irr2), and two columns within each type of irrigation were assigned an 
irrigation cycle each day [one watering for one hour each day (Dur 1) and water¬ 
ing three times a day for 20 min each watering (Dur 2)]. The columns were 
exposed to the sun and were orientated so that each column had a southern 
exposure. At the end of the study, three core soil samples were taken from each 
tube—one core was taken from the south side, one from the middle and one from 
the north side of the column. The value of pH of the soil was measured at six 
depths (0,20,40,60,80, and 100 cm) within each core sample. The data are shown 
in Table 28.41. 

1) Determine the experimental units used in this study assuming all repeated 
measures in this experiment satisfy split-plot-in-time type assumptions. 

2) Determine the design and treatment structure for each size of experimental 
unit. 

3) Write out an appropriate model and key out the analysis of variance table. 

4) Carry out an appropriate repeated measures analysis for the following data 
set assuming all repeated measures in this experiment satisfy split-plot-in- 
time type assumptions. 

5) Consider several other kinds of structures for the repeated measures in this 
experiment. What covariance structure would you select for these data? 
Why? 

6 ) Give a complete analysis of the data assuming the covariance structure 
selected in 5. 





Analysis of Crossover Designs 


Crossover designs are used to compare treatments that are administered to an experimen¬ 
tal unit (such as an animal or a person) in a sequence. That is, each experimental unit is 
subjected to each treatment in a predetermined sequence. The objective of crossover 
designs is to eliminate between-experimental unit variation in comparing treatments by 
observing treatments applied to the same experimental unit. 

Although crossover designs eliminate between experimental unit variation from treat¬ 
ment comparisons, other problems may arise in the form of carryover or residual effects. 
Carryover effects occur, say, when treatment A is given first and its effect has not worn off 
by the time treatment B is applied. If this lingering effect of A interferes with the response 
of the subject to treatment B (either positively or negatively), then there is a residual effect 
of treatment A on the response to treatment B. 

The crossover design model must contain a sequence effect, a time effect, a treatment effect, 
carryover effects, an experimental unit error term, and a time interval error term. The first 
section discusses a general model and its assumptions, and the last two sections discuss 
crossover designs for two treatments in two periods and more than two periods, respectively. 

In the general crossover design, t treatments are compared where each treatment is 
observed on each of the experimental units; that is, the treatments are applied in a speci¬ 
fied sequence to the experimental units. The experimenter constructs s sequences of the t 
treatments and randomly assigns experimental units to the zth sequence. Table 29.1 contains 
a set of sequences of three treatments that could be applied to experimental units. The 
assignment of sequences to subjects (a possible experimental unit) means that the subject 
is the experimental unit for sequences; the assignment of a treatment to a time interval 
means that the time interval is the experimental unit for treatments. 


29.1 Definitions, Assumptions, and Models 

A model to describe the response of an observation from the fth animal assigned to the/th 
sequence during the zth time period is 

Vijki ~ r + S; + <5// + Pj + T k + £,jk( (29.1) 

z = l,2, ...,s, j = 1,2,...,p, k-1,2,... ,t, and t = l,2,...,zz y 


599 



600 


Analysis of Messy Data Volume 1: Designed Experiments 


TABLE 29.1 


Possible Set of Sequences for Applying Three 
Treatments (A, B, and C) to Experimental Units 


Sequence 


Time/Period 


1 

2 

3 

1 

A 

B 

C 

2 

A 

C 

B 

3 

B 

A 

C 

4 

B 

C 

A 

5 

C 

A 

B 

6 

C 

B 

A 


In the above model S, is the effect of the zth sequence, P is the /th period effect, and T k is 
the effect of the fcth treatment where the value of k is determined by the combination of i 
and/ for the zth sequence and the/th period. Under ideal conditions, one would also have 
S i( ~ i.i.d. N( 0, <j 2 s ), £jj k ( ~ i.i.d. N( 0, a 2 f), and all 8 U and £ ijk , independent of one another. 

The ideal conditions on the error terms 8 tf and £, )/: , are important in developing an 
appropriate analysis. Since the experimental units are randomly assigned to the 
sequences, an appropriate assumption is that 8 U ~ i.i.d. N( 0, <r/). Since e ljkf is the error of 
the time interval and is, in some sense, a repeated measure error, it may be that these 
errors are not independently distributed. The ideal conditions given above are equiva¬ 
lent to assuming the covariance matrix of the time period errors satisfies a compound 
symmetry condition (see Section 27.1). In that case, the usual analysis of variance 
methods for independent 8 if and £ i;k , can be used to analyze the observed data. The 
compound symmetry assumption is necessarily met for two-period designs and may be 
met for many three-period designs. A time series error structure may be more appropri¬ 
ate for crossover designs with three or more periods, and the methods discussed in 
Chapter 27 should be utilized. 

The two-period/two-treatment crossover design is described in Section 29.2 and designs 
with more than two treatments or more than two periods are described in Sections 29.3 
and 29.4. 


29.2 Two Period/Two Treatment Designs 

Consider a two period/two treatment crossover design with two sequences of treatments, 
AB and BA, where AB means that treatment A is assigned to an experimental unit in period 1 
and treatment B is assigned to the same experimental unit in period 2. Likewise the BA 
sequence means that treatment B is assigned to an experimental unit in period 1 and treat¬ 
ment A is assigned to the same experimental unit in period 2. Also suppose that there are 
n k experimental units assigned to the AB sequence and that there are n 2 experimental units 
assigned to the BA sequence. Let the model for an observed response be 

Vijkt = P + S, + 8 it + Pj + T k + e ijkt 

i = AB,BA, j = 1,2, k = A,B, and £ = 1,2,...,n ; 


(29.2) 
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where the value of the subscript k is determined by the sequence i and period; combination. 
Let 


M,y = M + S, + P, + T k 


be a model for the cell mean corresponding to the ;th sequence and the /1h period, and let 
fiii be the observed mean of all of the observations in the (i,j )th cell. Then note that 

Mu = y 11 .. estimates y + S AB +P 1 + T A 

M 12 = y 12 .. estimates y + S AB + P 2 +T B (29.3) 

y 21 = j/ 21 .. estimates y + S BA + P 1 + T B 


and 


y 22 = 3 / 22 - estimates y + S BA + P 2 + T A 
From Equation 29.3, one can see that 


k = 

k = 

ki = 


Vn- + ¥n- 

2 

y 21•• + 3/22-■ 

2 

3 / 11 -- + 3 / 21 - 
2 


estimates y + S AB + P.+ 
estimates y + S BA + P.+ 
estimates y + S- + P 1 + 


Ta + 

2 

Trt + 

2 

T + T 


2 


(29.4) 


and 


M-2 = 


3/ 12 -- + 3 / 22 -- 


estimates y + S- + P 2 + 


TA + 
2 


Note also that the difference in the two sequence means, y x . - y 2 . estimates S AB - S BA , the 
difference in the two sequence parameters. The difference in the period 1 and period 2 
means, y. x - y . 2 estimates P, - P 2 . Finally, to estimate the difference in the two treatments, 
T a - T B , one uses (y n - y 12 - y 2l + y 22 )/ 2. Under the ideal conditions, one can show that 
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From Equation 29.5, one can show that 


and that 


Mi. -ju 2 .~N 

Am S ba ' 


V 



M-i - M.2 ~ N 

P - P — 

1 1 1 2' 2 


A + 


1 1 

- + — 


1 1 
- + - 


Mil Ml 2 M 21 "*■ M22 


N 


T - T 

1 A 1 B' 


1 1 

- + - 


The form of an ANOVA table corresponding to the two period/two treatment crossover 
design being discussed in this section is shown in Table 29.2. 

The F-statistic for testing H 0 : T A = T B is given by F = TMS/WSEMS where TMS is the 
treatment mean square in Table 29.2 and WSEMS is the within subject error mean square. 
One rejects H 0 if F > E aX „ i+n _ 2 . A (1 - a)100% confidence interval for T A - T B is given by 


Mu M12 M 21 "*■ M 22 


+ f 


6 2 

f 1 

1 "l 

hr 

— + — 

J 2 ' 

Oh 

n 2 ) 


where v = n 1 + n 2 - 2 and 6\ = WSEMS. As an example, consider data used by Grizzle 
(1965). The data are shown in Table 29.3. The data were analyzed with the SAS®-GLM pro¬ 
cedure using the commands shown in Table 29.4. 

Table 29.5 shows the model ANOVA and gives the value of the WSEMS as 6\ = 1.245. 
Table 29.6 gives the test of H 0 : T A = T B along with the between-subject error mean square, 
BSEMS = 6\ + 26g = 1.001. The observed significance level for H 0 is a= 0.1165. The treat¬ 
ment A and B least squares means are shown in Table 29.7 along with the observed signifi¬ 
cance level of a = 0.1165 comparing treatment A to treatment B. The expected mean squares 
for the rows in the ANOVA table are shown in Table 28.8. If one wants to find the estimates 
of the two-way Sequence x Period means, one can use the commands in Table 29.9. The 
two-way means are shown in Table 29.10. 

The data in Table 29.3 can also be analyzed with the SAS-Mixed procedure using the com¬ 
mands shown in Table 29.11. The results differ slightly from those obtained from the GLM 
analysis due to the fact that the between-subject error mean square is smaller than the 


TABLE 29.2 


ANOVA Table for a Two Period/Two Treatment Crossover Experiment 


Source of Variation 

Degrees of Freedom 

Expected Mean Square 

Sequence 

1 

0 ] + 2crJ + Q(Secjuence) 

Error (between subject) 

n i + n 2 -2 

61 + 26% 

Treatment 

1 

0 ] + Q(Trentment) 

Period 

1 

°i 2 + Q(Period) 

Error (within subject) 

n j + n 2 - 2 

a l 
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TABLE 29.3 

Grizzle's (1965) Data 


Seq 

Period 

Trt 

Person 

Y 

AB 

1 

A 

11 

0.2 

AB 

2 

B 

11 

1.0 

AB 

1 

A 

12 

0.0 

AB 

2 

B 

12 

-0.7 

AB 

1 

A 

13 

-0.8 

AB 

2 

B 

13 

0.2 

AB 

1 

A 

14 

0.6 

AB 

2 

B 

14 

1.1 

AB 

1 

A 

15 

0.3 

AB 

2 

B 

15 

0.4 

AB 

1 

A 

16 

1.5 

AB 

2 

B 

16 

1.2 

BA 

1 

B 

21 

1.3 

BA 

2 

A 

21 

0.9 

BA 

1 

B 

22 

-2.3 

BA 

2 

A 

22 

1.0 

BA 

1 

B 

23 

0.0 

BA 

2 

A 

23 

0.6 

BA 

1 

B 

24 

-0.8 

BA 

2 

A 

24 

-0.3 

BA 

1 

B 

25 

-0.4 

BA 

2 

A 

25 

-1.0 

BA 

1 

B 

26 

-2.9 

BA 

2 

A 

26 

1.7 

BA 

1 

B 

27 

-1.9 

BA 

2 

A 

27 

-0.3 

BA 

1 

B 

28 

-2.9 

BA 

2 

A 

28 

0.9 


within-subject error mean square in the GLM analysis. Consequently, the Mixed procedure 
estimates the variance component corresponding to subjects as <5^ = 0 whereas GLM would 
estimate the same variance component using the method of moments criteria as 

-2 BSEMS-WSEMS 1.005-1.245 -0.240 n-,™ 

o s = - 2 - = - 2 - = — 2 — = ~ 0120 

The results using the code in Table 29.11 are not given at this time, and the interested 
reader will need to run the code in Table 29.11. 

The analyses described above assume that there is no carryover from the treatment given 
in the first period to the observations taken in the second period. Next the case where 
carryover exists is considered. When there is carryover, then one can assume that 

Mn = Mu- estimates y + S AB + P 1 + T A 
y 12 = V\ 2 - estimates y + S AB + P 2 + T B + X A 
y 21 = y 2 1 .. estimates y + S BA + P l + T B 


(29.6) 
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TABLE 29.4 

SAS-GLM Code to Analyze the Data in Table 29.3 

TITLE 'CRSOVER EXAMPLE 29.1 - A TWO PERIOD/TWO TREATMENT DESIGN'; 
DATA GRIZ; 

INPUT SEQ $ PERIOD TRT $ PERSON Y; 

LINES; 


AB 

1 

A 

11 

0.2 

AB 

2 

B 

11 

1.0 

AB 

1 

A 

12 

0.0 

AB 

2 

B 

12 

-0.7 

AB 

1 

A 

13 

-0.8 

AB 

2 

B 

13 

0.2 

AB 

1 

A 

14 

0.6 

AB 

2 

B 

14 

1.1 

AB 

1 

A 

15 

0.3 

AB 

2 

B 

15 

0.4 

AB 

1 

A 

16 

1.5 

AB 

2 

B 

16 

1.2 

BA 

1 

B 

21 

1.3 

BA 

2 

A 

21 

0.9 

BA 

1 

B 

22 

-2.3 

BA 

2 

A 

22 

1.0 

BA 

1 

B 

23 

0.0 

BA 

2 

A 

23 

0.6 

BA 

1 

B 

24 

1 

o 

CO 

BA 

2 

A 

24 

-0.3 

BA 

1 

B 

25 

-0.4 

BA 

2 

A 

25 

-1.0 

BA 

1 

B 

26 

-2.9 

BA 

2 

A 

26 

1.7 

BA 

1 

B 

27 

-1.9 

BA 

2 

A 

27 

-0.3 

BA 

1 

B 

28 

-2.9 

BA 

2 

A 

28 

0.9 


PROC GLM; 

TITLE2 'STATISTICAL ANALYSIS USING SAS-GLM'; 
CLASSES SEQ TRT PERIOD PERSON; 

MODEL Y=SEQ PERSON(SEQ) TRT PERIOD; 
LSMEANS TRT/PDIFF; 

RANDOM PERSON(SEQ); 

RUN; 


TABLE 29.5 

Model ANOVA with the Within Subject Error Mean Square 


Source 

df 

Sum of Squares 

Mean Square 

F-Value 

Pr>T 

Model 

15 

27.96583333 

1.86438889 

1.50 

0.2435 

Error 

12 

14.94416667 

1.24534722 



Corrected total 

27 

42.91000000 
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TABLE 29.6 

Type III ANOVA Table for Data in Table 29.3 


Source 

df 

Type III SS 

Mean Square 

F-Value 

Pr > F 

Seq 

1 

4.57333333 

4.57333333 

3.67 

0.0794 

Person(Seq) 

12 

12.00666667 

1.00055556 

0.80 

0.6446 

Trt 

1 

3.56297619 

3.56297619 

2.86 

0.1165 

Period 

1 

6.24297619 

6.24297619 

5.01 

0.0449 


TABLE 29.7 

Treatment Main Effect Means 




H 0 : LSMeanl = LSMean2 

Trt 

YLSMean 

Pr > 1 1 1 

A 

0.36875000 

0.1165 

B 

-0.35208333 



TABLE 29.8 

Table of Expected Mean Squares for the Data in Table 29.3 


Source 

Type III Expected Mean Square 

Seq 

Var(Frror) + 2 Var [Person(Seq)] + Q(Seq) 

Person(Seq) 

Var(Frror) + 2 Var[Person(Seq)] 

Trt 

Var(E nor) + Q(Trt) 

Period 

Var(Frror) + Q(Period) 


TABLE 29.9 

SAS-GLM Code to Analyze the Data in Table 29.3 

PROC GLM; 

TITLE2 'STATISTICAL ANALYSIS USING SAS-GLM'; 
CLASSES SEQ TRT PERIOD PERSON; 

MODEL Y=SEQ PERSON(SEQ) PERIOD SEQ*PERIOD; 
LSMEANS SEQ*PERIOD; 

RANDOM PERSON(SEQ); 

RUN; 


TABLE 29.10 


Sequence x Period Means 


Seq 

Period 

YLSMean 

AB 

1 

0.30000000 

AB 

2 

0.53333333 

BA 

1 

-1.23750000 

BA 

2 

0.43750000 
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TABLE 29.11 

SAS-Mixed Code to Analyze the Data in Table 29.3 

PROC Ml XED; 

TITLE2 'STATISTICAL ANALYSIS USING SAS-MIXED'; 
CLASSES SEQ TRT PERIOD PERSON; 

MODEL Y=SEQ PERIOD TRT; 

LSMEANS TRT/PDIFF; 

RANDOM PERSON(SEQ); 

RUN; 

PROC Ml XED; 

TITLE2 'STATISTICAL ANALYSIS USING SAS-MIXED'; 
CLASSES SEQ TRT PERIOD PERSON; 

MODEL Y=SEQ PERIOD SEQ*PERIOD; 

LSMEANS SEQ*PERIOD; 

RANDOM PERSON(SEQ); 

RUN; 


and 


p 22 = y 22 „ estimates p + S BA + P 2 + T A + X B 

where X A is a parameter for the sequence AB corresponding to carryover from treatment 
A given in period 1 into period 2, and A B is a parameter for the sequence BA 
corresponding to carryover from treatment B given in period 1 into period 2. 

From Equation 29.6, one can see that 


k = 

P 2 .= 


¥n- + ¥n- 
2 

3 / 21 -- + 3 / 22 -- 
2 

¥11- + ¥21- 


P-i = 


2 


estimates p + S AB + P.+ 
estimates p + S BA + P.+ 
estimates p + S- + P 1 + 


T + T 

1 A^ 1 B 


T + T 

1 A^ 1 B 


t a + t b 



(29.7) 


and 


al 3/12-- + 3/22- . — T a + T b X A + X B 

p. 2 = -estimates p + S. + P 2 +- - +- 


Carryover is said to exist whenever X A ^ X B . That is, in the no carryover case X A and X B do 
not actually have to be equal to zero, they just have to be equal to one another. In the case 
where X A = X B , X A and X B are confounded with period effects, and one does not need to 
include the parameters in model (29.2). In the carryover case, the difference in the two 
sequence means, p t . - p 2 . estimates S AB - S BA + [{X A - A,.)/2J. The difference in the period 1 
and period 2 means, p A - p 2 estimates P 1 -P 2 - [(A , + A B )/2]. Finally, (p n - p ]2 - p 2] + p 2 f)/2 
estimates T A -T B + [(A B - A,,)/2]. In the case where there is carryover, the second period 
data does not help one estimate the difference in the two treatments. When there is 
carryover, one must estimate T A - T B by p n - p 2V and this estimator only depends on the 
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period 1 data. One might also note that, if only period 1 data are going to be used to 
estimate treatment effect, the model for period 1 can be simplified to 

y kf = y + T k + e* f , k = A,B, and l = 1,2,...,n,- (29.8) 

where £*, = S k( + e kt . Furthermore, 

Var(/) r , - jl 21 ) = (a 2 e + a 2 ^ + 

The form of an ANOVA table corresponding to a two period/two treatment crossover 
design when there is carryover is shown in Table 29.12. This table has the same form as 
Table 29.2 except for the expected mean square column. 

If there is no carryover, then it should not matter which sequence a subject is assigned to. 
That is, philosophically, S AB - S BA should be equal to zero. Thus, the F-statistic given by 
F = SeqMS/BSEMS provides a test for whether there is carryover or not. If F > C aXn+n ^_ 2 , 
then one would conclude that there is a significant carryover effect. It should be noted that 
the test for carryover is a between-subject comparison, and as such, the test is not as 
powerful as a test that is based on a within-subject comparison. It is recommended that 
one should not use the two period/two treatment crossover design if one thinks that there 
might be carryover effects. In some instances, an experimenter might be able to include a 
so-called "wash-out" period between the two periods of the crossover design. A wash-out 
period would be a period of time that is long enough so that any residual effect of the 
treatment given in the first period would be eliminated or washed-out prior to applying 
the second treatment to an experimental unit. It is also extremely important to note that 
the problem of having carryover in a two period/two treatment crossover design is reduced 
and/or eliminated in some crossover designs that have more than two treatments and/or 
more than two periods. Such designs will be considered in the next section. 

As an example of a two period/two treatment crossover experiment where there might be 
carryover, consider once again Grizzle's (1965) data that was shown in Table 29.3. The data will 
be analyzed using the SAS-Mixed procedure using the commands shown in Table 29.13. 

The first set of Mixed commands is used to obtain a test for carryover and a test for treat¬ 
ment effects should there not be any significant carryover. These commands also give the 
treatment main effect means that one can use if there is no carryover. The second set of 
commands use a different model statement in order to get the Sequence x Period two-way 
means, and the ESTIMATE option is included in order to obtain an estimate of treatment 
effect from period 1 data. 

Table 29.14 gives tests on the fixed effects from the first set of Mixed commands in 
Table 29.13. Note that the SEQ effect is significant at the 0.0665 level, indicating that there is 


TABLE 29.12 

ANOVA Table for a Two Period/Two Treatment Crossover Experiment 


Source of Variation 

Degrees of Freedom 

Expected Mean Square 

Sequence 

1 

ol + 2<tJ + Q(Sequence, Carryover) 

Error (between subject) 

n i + n 2 ~ 2 

+ 2crJ 

Treatment 

1 

<7 1 + Q(Treatment, Carryover) 

Period 

1 

cl + Q{Period, Carryover) 

Error (within subject) 

n i + n 2 - 2 

cl 
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TABLE 29.13 

SAS-Mixed Code to Analyze the Data in Table 29.3 when Carryover Exists 

PROC MIXED; 

TITLE2 'STATISTICAL ANALYSIS USING SAS-MIXED'; 

CLASSES SEQ TRT PERIOD PERSON; 

MODEL Y=SEQ PERIOD TRT; 

LSMEANS TRT/PDIFF; 

RANDOM PERSON(SEQ); 

RUN; 

PROC MIXED; 

TITLE2 'STATISTICAL ANALYSIS USING SAS-MIXED'; 

CLASSES SEQ TRT PERIOD PERSON; 

MODEL Y=SEQ PERIOD SEQ*PERIOD; 

LSMEANS SEQ*PERIOD; 

ESTIMATE 'A-B FROM PERIOD 1' SEQ 1 -1 SEQ*PERIOD 10-10; 
RANDOM PERSON(SEQ); 

RUN; 


TABLE 29.14 

Tests on Fixed Effects 
Type III Tests of Fixed Effects 


Effect 

Num df 

Den df 

F-Value 

Pr>F 

Seq 

1 

12 

4.07 

0.0665 

Period 

1 

12 

5.56 

0.0362 

Trt 

1 

12 

3.17 

0.1002 


some evidence of carryover. Also note that the Trt effect is only significant at the 0.1002 level. 
The Trt main effect means are shown in Table 29.15, and a comparison between the two 
treatment means is also given in this table. 

Table 29.16 gives the tests on fixed effects from the second set of Mixed commands in 
Table 29.13. Note that the test for Seq is the same as that given in Table 29.14. Also note that 
the test for Seq x Per is the same as the test for TRT in Table 29.14. Table 29.17 gives the 
Seq x Per means, and Table 29.18 gives a test that compares the two treatments from only 
the period 1 data. An examination of Table 29.18 reveals that the carryover effect evidently 


TABLE 29.15 

Treatment Means and a Test on the Difference of the Two Treatment Means 


Least Squares Means 

Effect Trt 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

Trt 

A 

0.3688 

0.2862 

12 

1.29 

0.2218 

Trt 

B 

-0.3521 

0.2862 

12 

-1.23 

0.2421 

Differences of Least Squares Means 





Effect 

Trt 

Trt 

Estimate Standard Error 

df f-Value 

Pr> |f| 

Trt 

A 

B 

0.7208 0.4047 

12 1.78 

0.1002 
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TABLE 29.16 

Additional Tests on Fixed Effects 
Type III Tests of Fixed Effects 


Effect 

Num df 

Den df 

F-Value 

Pr>F 

Seq 

1 

12 

4.07 

0.0665 

Period 

1 

12 

5.56 

0.0362 

Seq x Period 

1 

12 

3.17 

0.1002 


TABLE 29.17 


Sequence x Period Two-Way Means 


Least Squares Means 







Effect 

Seq 

Period 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

Seq x Period 

AB 

1 

0.3000 

0.4326 

12 

0.69 

0.5012 

Seq x Period 

AB 

2 

0.5333 

0.4326 

12 

1.23 

0.2413 

Seq x Period 

BA 

1 

-1.2375 

0.3747 

12 

-3.30 

0.0063 

Seq x Period 

BA 

2 

0.4375 

0.3747 

12 

1.17 

0.2656 


TABLE 29.18 

Treatment A vs Treatment B from Period 1 Estimates 


Estimates 

Label 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

A-B from Period 1 

1.5375 

0.5723 

12 

2.69 

0.0198 


masked part of the treatment effect in the analysis given in Table 29.7, where no carryover 
was assumed as the treatment effect was not significant there, while the comparison using 
only first-period data finds the treatments to be significantly different. 


29.3 Crossover Designs with More than Two Periods 

The statistical analysis of a crossover design that has more than two periods and/or 
more than two treatments when there is no carryover is straightforward. The model in 
Equation 29.1 is appropriate when the ideal conditions on between subject errors and 
within subject errors are satisfied. The SAS commands given in Tables 29.4, 29.9, and 
29.11 can still be used to obtain the useful and interesting statistics. Since analyses when 
there is no carryover are straightforward, this section concentrates on situations where 
carryover exists. 

The first case considered is the case where there are still two treatments, but involves 
sequences having three periods. One possibility is to use the two sequences ABA and BAB. 
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For the ABA sequence, an experimental unit would receive treatment A in period 1, treatment 
B in period 2, and treatment A again in period 3. The BAB sequence would be handled 
similarly. Another possibility is to use the two sequences ABB and BAA. A third possibil¬ 
ity is to use all four of these sequences. That is, ABA, BAB, ABB and BAA would all be used. 
This section concentrates on a crossover design using the two sequences ABA and BAB. 
The other possibilities can be analyzed similarly. 

Table 29.19 identifies the model parameters that can be associated with the sequence by 
period combinations of the ABA and BAB crossover design. 

Suppose that n, subjects have been assigned to the ABA sequence and that n 2 subjects 
have been assigned to the BAB sequence. Let y i]f be the observed response from subject i 
in sequence i and period /'. Let /}„ = y,,., i = 1, 2; j = 1, 2, 3. Under the ideal conditions given 
for the error terms for the model in Equation 29.1, one can show that 
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(29.9) 


Note that the contrast 

Bn ~ 2 ^ 12 ~ 2 ^ 13 — ^ 21 + 2 ^ 22 + 2 ^ 23 = (^ + S ABA + Pi — Ta) ~ 2 ^B + $aba + P 2 + Tb + A a ) 

~ 2 1 (B + $aba + P3 + T A + A B ) — (p+ S BAB + P 1 + T g ) 

+ 2(B + S BAB + P 2 + T a + A b ) + 2 (B + S BAB + P 3 + T b + AJ 
= T a -T b 


TABLE 29.19 

Cell Mean Parameters for a Three Period/Two Treatment Crossover Design 


Sequence 

Period 1 

Period 2 

Period 3 

ABA 

Mil = M ^ABA + Mi + 

Ml 2 = M + ^ABA ^2 + 

Ml 3 = M + ^ABA + ^3 + 

BAB 

M2I = M + ^BAB + ^1 + 

M 22 = M + ^BAB + ^2 + + ^6 

Lh .3 = M + ^BAB + ^3 + + ^4 
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Thus even if carryover is present in the ABA, BAB crossover design, the direct difference 
in the two treatments can be estimated by 

Mil - 2^ 12 - 2^ 13 ~ ^ 21 + 2 ^ 22 + 2 ^ 23 ( 29 . 10 ) 


Also note that 


Var (Au - |Mi 2 - jMis - £21 + \fi 22 + 5 M 23 ) = (29.11) 

so the contrast in Equation 29.10 is a within subject contrast whose variance depends only 
on < 7 2 . 

One question of interest might be how should treatment main effect means be defined? 
One possibility is to estimate the A main effect mean by averaging over all the cells in 
Table 29.19 that would receive treatment A. That is, the A mean would be (ju u + q 22 + q 13 )/3 
and the B mean would be {jd 2 \ + y u + y 23 )/3. In terms of the effects model parameters 
defined in Table 29.19, these two functions are equal to 

M + j S AB a + ^ S BAB + P. + T A + ~(h B + AJ 

and 

M + g S AB A + 2 S B ab + P. + T b + ^(A a + A B ) 

respectively. Such a definition would not make sense since the difference in such an A 
mean and B mean is equal to | S ABA -1 S BAB +T A -T B which is not equal to T A - T B . That is, 
the difference in such a defined A mean and B mean is aliased with sequence effect. A 
second definition of an A main effect mean is 


y + S. + P. + T a + i(A B + AJ 

and a B main effect mean would be similarly defined by 

H + S. + P. + T b + i(A B + A a) 

Such definitions are reasonable since the first is equal to the following linear function of 
the cell means in Table 29.19 

3 M11 — 12^12 — 12 ^ 13 ~~ 3 M21 + + ^2^23 

and the second is equal to 

11 + ^Ml2 + ^Ml3 + |M21 - ^M22 - ^M23 

Furthermore, the difference in these two means is 

Mil — 2^-12 ~ 2^-13 — M2I + 2M22 + 2^-23 = T'a~T b 
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Note that this is the same contrast in the cell means as that given in Equation 29.10. 
A contrast in the cell means that measures carryover effect is 


Mil M13 M 21 + M 23 — 


(29.12) 


and the variance of the estimate of this contrast is 

Var(/t u - iX x 3 - p 21 + p 23 ) = 2o 2 e {^ + ±- 2 


(29.13) 


and this contrast is also a within-subjects contrast. 

To illustrate with an example, consider the data in Table 29.20 where there were three 
subjects assigned to the ABA sequence and another three subjects assigned to the BAB 
sequence. Thus n 1 = n 2 = 3. The SAS code that reads the data and computes the Sequence x 
Period cell means is shown in Table 29.21. The cell means are shown in Table 29.22. 

Thus the estimate of the A main effect mean is given by 


2 - _ J_- 

3 Mu y 2 


1 


32M13 3M21 ~r 32 / 

= |(24.133) - (26.533) - ^(23.933) -i(26.367) + ^(26.233) + ^-(24.833) = 24.372 

and the estimate of the B main effect mean is given by 

-|Mll + ^Ml2 + ^M 13 + |m 2 1 - ^M22 - ^M23 

= -i(24.133) + ^(26.533) + ^(23.933) +|(26.367) - ^(26.233) - ^(24.833) = 26.306 


-|M2i 


+1 -i M22 


+ 12 ^ 23 


TABLE 29.20 

Data for an ABA/BAB Crossover Experiment 


Seq 

Per 

Trt 

Person 

Y 

ABA 

1 

A 

1 

25.1 

ABA 

2 

B 

1 

27.6 

ABA 

3 

A 

1 

24.5 

ABA 

1 

A 

2 

22.0 

ABA 

2 

B 

2 

24.3 

ABA 

3 

A 

2 

21.6 

ABA 

1 

A 

3 

25.3 

ABA 

2 

B 

3 

27.7 

ABA 

3 

A 

3 

25.7 

BAB 

1 

B 

4 

25.5 

BAB 

2 

A 

4 

23.7 

BAB 

3 

B 

4 

24.9 

BAB 

1 

B 

5 

27.4 

BAB 

2 

A 

5 

27.9 

BAB 

3 

B 

5 

24.6 

BAB 

1 

B 

6 

26.2 

BAB 

2 

A 

6 

27.1 

BAB 

3 

B 

6 

25.0 
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TABLE 29.21 

SAS Code to Obtain Seq x Per Means for the Data in Table 29.20 

TITLE 'CRSOVR EXAMPLE #29.2 AN ABA/BAB DESIGN'; 

DATA CRS; 


INPUT 

LINES; 

SEQ $ 

PER 

TRT $ PERSON Y 

ABA 

1 

A 

1 

25.1 

ABA 

2 

B 

1 

27.6 

ABA 

3 

A 

1 

24.5 

ABA 

1 

A 

2 

22.0 

ABA 

2 

B 

2 

24.3 

ABA 

3 

A 

2 

21.6 

ABA 

1 

A 

3 

25.3 

ABA 

2 

B 

3 

27.7 

ABA 

3 

A 

3 

25.7 

BAB 

1 

B 

4 

25.5 

BAB 

2 

A 

4 

23.7 

BAB 

3 

B 

4 

24.9 

BAB 

1 

B 

5 

27.4 

BAB 

2 

A 

5 

27.9 

BAB 

3 

B 

5 

24.6 

BAB 

1 

B 

6 

26.2 

BAB 

2 

A 

6 

27.1 

BAB 

3 

B 

6 

25.0 


PROC MEANS ; 

CLASS SEQ PER; 
VAR Y; 

RUN; 


TABLE 29.22 

Sequence by Period Cell Means 


Seq 

PER 

N 

Mean 

ABA 

1 

3 

24.1333333 


2 

3 

26.5333333 


3 

3 

23.9333333 

BAB 

1 

3 

26.3666667 


2 

3 

26.2333333 


3 

3 

24.8333333 


The difference between the A mean and the B mean is 

i A -f B = 24.372 - 26.306 = -1.934 

One can use the SAS-Mixed procedure to estimate the treatment means, test for treatment 
differences, test for carryover, and estimate the variance components for two treatment/ 
three period crossover designs. For example, in order to analyze the data in Table 29.20, 
one must define a new variable that identifies the treatment that was given in the previous 
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TABLE 29.23 

SAS Code to Create a PRIORTRT Variable for the Data in Table 29.20 
DATA CRS2 ; SET CRS; 

IF SEQ='ABA' AND PER=2 THEN PRIORTRT=’A'; 

ELSE IF SEQ='ABA' AND PER=3 THEN PRIORTRT='B' ; 

ELSE IF SEQ='BAB' AND PER=2 THEN PRIORTRT='B' ; 

ELSE IF SEQ=' BAB' AND PER=3 THEN PRIORTRT='A' ; 

ELSE PRI0RTRT='0' ; 

RUN; 

PROC PRINT; 

TITLE2 'A PRINT OF THE DATA WITH PRIORTRT VARIABLE INCLUDED'; 

RUN; 


period. So that the first period data are not eliminated, this new variable has to also be 
defined for the period one data. For the ABA sequence, this new variable could be defined 
to take on the values O, A, and B for periods 1, 2, and 3, respectively. And for the BAB 
sequence, the new variable would take on values O, B, and A, respectively. The SAS code in 
Table 29.23 uses a new variable, called PRIORTRT, that has the properties given in the pre¬ 
ceding sentences. A display of the data with this new variable defined is shown in 
Table 29.24. The SAS-Mixed commands that can be used to analyze the data in Table 29.20 
are shown in Table 29.25. The resulting output is shown in Tables 29.26-29.29. Table 29.26 
gives estimates of the variance components. From this table, one can see that <r| = 2.1378 
and 6\ = 0.7872. Table 29.27 gives tests for difference between treatment A and treatment B. 
The row of Table 29.27 labeled Trt is a test of H m : t A = z B and the row labeled PRIORTRT is 


TABLE 29.24 

Print of New Data with PRIORTRT Variable Defined 


Obs 

Seq 

Per 

Trt 

Person 

Y 

PRIORTRT 

1 

ABA 

1 

A 

1 

25.1 

O 

2 

ABA 

2 

B 

1 

27.6 

A 

3 

ABA 

3 

A 

1 

24.5 

B 

4 

ABA 

1 

A 

2 

22.0 

O 

5 

ABA 

2 

B 

2 

24.3 

A 

6 

ABA 

3 

A 

2 

21.6 

B 

7 

ABA 

1 

A 

3 

25.3 

O 

8 

ABA 

2 

B 

3 

27.7 

A 

9 

ABA 

3 

A 

3 

25.7 

B 

10 

BAB 

1 

B 

4 

25.5 

O 

11 

BAB 

2 

A 

4 

23.7 

B 

12 

BAB 

3 

B 

4 

24.9 

A 

13 

BAB 

1 

B 

5 

27.4 

O 

14 

BAB 

2 

A 

5 

27.9 

B 

15 

BAB 

3 

B 

5 

24.6 

A 

16 

BAB 

1 

B 

6 

26.2 

O 

17 

BAB 

2 

A 

6 

27.1 

B 

18 

BAB 

3 

B 

6 

25.0 

A 
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TABLE 29.25 

SAS Code to Analyze the Data in Table 29.24 

PROC MIXED; 

TITLE3 'AN ANALYSIS USING MIXED'; 

CLASSES SEQ PERSON TRT PER PRIORTRT; 

MODEL Y=SEQ TRT PER PRIORTRT/DDFM=KR; 

LSMEANS TRT/PDIFF; 

ESTIMATE 'TRT DIFF' TRT 1 -1; 

ESTIMATE 'CARRYOVER' PRIORTRT 1 -1; 

CONTRAST 'PERIOD DIFF' PER 0 1-1, PER 1 -.5 -.5 PRIORTRT -.5 -.5 1; 
RANDOM PERSON(SEQ); 

RUN; 


TABLE 29.26 


Estimates of the Variance Components 


Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Person(Seq) 

2.1378 

Residual 

0.7872 


TABLE 29.27 

Tests on the Fixed Effects 


Type III Tests of Fixed Effects 


Effect 

Num df 

Den df F- Value 

Pr>F 

Seq 

1 

4 0.05 

0.8287 

Trt 

1 

8 4.75 

0.0610 

Per 

1 

8 15.24 

0.0045 

PRIORTRT 

1 

8 1.69 

0.2293 

Contrasts 




Label 

Num df 

Den tIf F- Value 

Pr>F 

Period diff 

2 

8 7.67 

0.0138 


a test of H 02 : A,, = A s . Note that the row of Table 29.27 labeled PER has only one degree of 
freedom associated with it. This may seem strange since there are three periods. However, 
there is complete confounding between period 1 and A 0 , so the row labeled PER compares 
only periods 2 and 3. That is, the row labeled PER tests H 03 : P 2 = P 3 . The CONTRAST option 
in Table 29.25 gives a test statistic that compares all three periods to one another; that is, 
the results of this CONTRAST option tests H 03 : P 1 + A 0 = P, = P 3 . The results of this option 
are appended to the bottom of Table 29.27. 

The estimates of the treatment main effect means are shown in Table 29.28. Note that the 
values given by the SAS-Mixed procedure are the same as the estimates that were com¬ 
puted from the Table 29.22 using the second definition for main effect means. The results 
from the two ESTIMATE statements in Table 29.25 are shown in Table 29.29. The first of 
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TABLE 29.28 

Treatment Main Effect Means 


Least Squares Means 





Effect 

Trt 

Estimate 

Standard Error 

df 

t- Value 

Pr> |f| 

Trt 

A 

24.3722 

0.7726 

7.94 

31.55 

<0.0001 

Trt 

B 

26.3056 

0.7726 

7.94 

34.05 

<0.0001 


TABLE 29.29 

Tests for Treatment and Carryover Effects from Estimate Options 
Estimates 


Label 

Estimate 

Standard Error 

df 

t- Value 

Pr > | f | 

Trt diff 

-1.9333 

0.8873 

8 

-2.18 

0.0610 

Carryover 

-1.3333 

1.0245 

8 

-1.30 

0.2293 


these estimates x A - x B , gives the estimate's estimated standard error and gives a f-statistic 
for testing H m : x A = x B as well as the test's observed significance level. The second ESTIMATE 
statement estimates A,, - A B , gives the estimate's estimated standard error and gives a 
f-statistic for testing H 02 : X A = A B as well as the test's observed significance level. The reader 
should note that the observed significance levels in Table 29.29 for treatment difference 
and carryover difference are the same as the corresponding observed significance levels in 
Table 29.27. The estimated standard error for the estimated treatment difference can be 
computed by substituting a 2 for a\ in (29.11). One gets 

Va? (All - |Ai2 - ^Ais - A21 + \fi 22 + \ A23) = \a\ [w; + ttt) 

= (I) (0.7872) (! + !) = 0.7872 

And thus the estimated standard error of x A - x B is t/0.7872 = 0.8872. 

Similarly, one can get the estimated standard error of X A - A B by substituting d\ for 07 . in 
Equation 29.13 and taking the square root of the result. The degrees of freedom associated 
with each of these estimated standard errors is 2 (n 1 + n 2 - 2) = 8. 


29.4 Crossover Designs with More than Two Treatments 

Next, consider the three treatment/three period crossover design in six sequences. The 
design sequences are ABC, ACB, BAC, BCA, CAB, and CBA. All six of these sequences are 
required in order to get a direct comparison of treatments that will not be aliased with 
carryover. Williams (1949) developed designs that are balanced with respect to carryover 
effects. A Williams crossover design that is balanced for carryover effects has the property 
that every treatment is followed by every other treatment exactly the same number of times. 
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The six-sequence design given above is a Williams design. In this six-sequence design, 
each treatment (letter) follows each other treatment twice. Note also that, while the three- 
sequence design with the sequences ABC, BCA, and CAB is a Latin square design, it is not 
a Williams design. In this design, A is followed by B twice, but A is never followed by C. 
Likewise, B is followed by C twice, but B is never followed by A; and C is followed by A 
twice, but C is never followed by B. A Williams design for three treatments in three periods 
must use all six of the sequences ABC, ACB, BAC, BCA, CAB, and CBA. 

Suppose one has four treatments and four periods. Clearly, the 24-sequence design that 
has all 24 possible orderings of the treatments A, B, C, and D will be a Williams design. 
Another Williams design for four treatments in four periods is given by the sequences 
ABCD, BDAC, CADB, and DCBA. Let t be the number of treatments and periods in the 
crossover design to be used. If t is an even number, then a Williams design can be con¬ 
structed using a single special txt Latin square design, and if t is odd, a Williams design 
can be constructed by using two special txt Latin square designs. 

Table 29.30 gives the cell mean parameters for the three period/three treatment Williams 
design. To show that one can estimate treatment differences when carryover is present, 
one can show that for the cell mean definitions in Table 29.30 

(5y n — 2y 12 — 3y 13 + 4y n + 2y 2 2 ~ — 31 + 2y 32 + 3y 33 — 4q 41 — 2y i3 

+ 6^43 — /j. 51 + 4^52 — 3^53 + /j. 61 — 4y h2 + 3fj. 63 )/24 = t a — t b (29.14) 

Also note that one can also compare carryover effects as 

(Mu + 2 /i 12 — 3y 13 + 0y n + 2y 22 — 2y 23 — y 31 — 2y 32 + 3fi 33 + 0^t 41 — 2y n 
+ 2^43— y 5l + 0^52 + M53 + fLi + 0^62 — = X A — X B ( 29 . 15 ) 

Similar functions of the cell mean parameters that simplify to r A - r c and r B - r c as well as 
for X A - X c and X B - X c can also be obtained. 

As an example consider the data in Table 29.31. These data come from a three period/ 
three treatment crossover design in six sequences. There were a few subjects that had 
missing data. A SAS data set was created using the commands in Table 29.32. These 
commands also create a carryover parameter that gives the value of the treatment in the 
previous period. This parameter has a value of “O" for first period data. The data were 
initially analyzed with SAS-Mixed using the commands in Table 29.33. Table 29.34 gives 
the estimates of the between-subject and within-subject variance components. That is 
6\ = 3.2278 and a\ = 0.8934. 


TABLE 29.30 

Cell Mean Parameters for a Three Period/Three Treatment Crossover Design in Six Sequences 


Sequence 

Period 1 


Period 2 

Period 3 

ABC 

Pll = P + $ABC + Pi 

+ t a 

P 12 = P + S ABC + P 2 + T b + X A 

A^i3 = P + S ABC + P 3 + T c + X B 

ACB 

Ihi = P + S ACB + Pi 

+ t a 

/^22 = P S ACB + P 2 + Pc + ^A 

Lh.3 = M S A q B + P 3 + P B + ^c 

BAC 

Lhi = v + S B AC + Pi 

+ t b 

Ihi = P + S BA c + P 2 + Pa + A’B 

P 33 = P + S BA c + P$ + T c + Xa 

BCA 

/41 = P + $BCA + Pi 

+ t b 

P 42 = P + S B CA + P 2 + Pc + ^B 

P 43 = P + S B qa + P 3 + Pa + 

CAB 

/%i = P + Scab + Pi 

+ T C 

/%2 = P Sqab + Pi + Pa + 

P 53 = P Scab P 3 Pb + ^a 

CBA 

/4i = P + Sqba + Pi 

+ T C 

fi 62 = fl + S CBA + P 2 + T b + Xq 

= Scba ■*" P 3 + Pa ^b 
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TABLE 29.31 

Data for a Three Period/Three Treatment Crossover Design 


Seq 

Subj 

Y1 

Y2 

Y3 

Seq 

Subj 

Y1 

Y2 

Y3 

ABC 

11 

20.1 

20.3 

— 

ACB 

21 

24.7 

29.4 

27.5 

ABC 

12 

23.3 

24.8 

28.7 

ACB 

22 

23.8 

— 

24.1 

ABC 

13 

23.4 

— 

28.3 

ACB 

23 

23.6 

— 

25.0 

ABC 

14 

19.7 

21.3 

25.7 

ACB 

24 

20.2 

— 

— 

ABC 

15 

19.2 

20.9 

25.9 

ACB 

25 

19.8 

23.7 

23.3 

ABC 

16 

22.2 

22.0 

— 

ACB 

26 

21.5 

25.5 

20.8 

BAC 

31 

24.3 

— 

30.1 

BCA 

41 

20.9 

27.5 

24.3 

BAC 

32 

26.4 

26.4 

32.3 

BCA 

42 

21.9 

28.6 

23.1 

BAC 

33 

19.9 

23.7 

25.5 

BCA 

43 

22.0 

27.4 

— 

BAC 

34 

23.9 

26.8 

30.8 

BCA 

44 

23.3 

30.7 

26.6 

BAC 

35 

20.5 

23.2 

26.3 

BCA 

45 

18.8 

27.9 

24.6 

BAC 

36 

21.8 

23.6 

— 

BCA 

46 

24.6 

29.8 

26.6 

CAB 

51 

24.0 

21.8 

21.6 

CBA 

61 

23.2 

18.9 

23.8 

CAB 

52 

25.9 

23.7 

— 

CBA 

62 

23.9 

21.5 

25.4 

CAB 

53 

25.5 

— 

23.4 

CBA 

63 

28.0 

25.3 

28.1 

CAB 

54 

27.9 

25.4 

24.4 

CBA 

64 

24.6 

22.7 

23.8 

CAB 

55 

25.3 

26.4 

25.8 

CBA 

65 

27.7 

23.5 

25.6 

CAB 

56 

25.7 

— 

24.9 

CBA 

66 

21.5 

18.1 

22.8 


Table 29.35 gives the tests on the fixed effects. An examination of Table 29.35 shows that 
there is a large significant treatment effect (p < 0.0001) and a significant carryover effect 
(p = 0.0306). Tables 29.36 and 29.37 give the results of the LSMEANS option. It is interesting 
to note than neither the treatment nor the prior treatment least squares means are esti¬ 
mable. The reason for this is that the SAS-Mixed procedure generates functions of the 
model parameters that are not estimable. 

For example, the SAS-Mixed procedure defines the treatment A least squares mean as 


j + $abc + S. 


y A r-'T) S D 


This function is not estimable because P t and A 0 are completely aliased with one another 
in the model, and consequently, they both must have the same multipliers. In the above 
expression, they do not as the multiplier on P : is equal to ! while the multiplier on A 0 is 
equal to A better definition of the treatment A least squares mean is 

,. , $abc + SaCB + $BAC + $BCA + SCAB + $CBA , + ^2 + ^3 , „ , 2A A + 2A B + 2A C + 3A 0 ( . on n ^ 

p + g -| 2 - h l a -* q (zy.w) 

In this definition, both P 1 and A 0 have a multiplier equal to |. Furthermore, since the sum 
of the multipliers on the four carryover parameters must add to one, the multipliers on A At 
Ab/ and A c must each be equal to |. The function in Equation 29.16 is estimable. In a similar 
manner, one can define treatment B and treatment C least squares means, and pairwise 
differences among these means will not include carryover parameters. 
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TABLE 29.32 

SAS Code to Analyze the Data in Table 29.31 

DATA ONE; INPUT SEQ $ SUBJ @@; 

DO PER=1 TO 3; INPUT Y @@; OUTPUT; END; 
LINES; 


ABC 

11 

20 

. 1 

20.3 

ACB 


21 

24.7 

29.4 

27.5 




ABC 

12 

23 

.3 

24.8 

28.7 ACB 


22 

23.8 


24.1 




ABC 

13 

23 

.4 


28.3 ACB 


23 

23.6 


25.0 




ABC 

14 

19 

. 7 

21.3 

25.7 ACB 


24 

20.2 






ABC 

15 

19 

.2 

20.9 

25.9 ACB 


25 

19.8 

23.7 

23.3 




ABC 

16 

22 

.2 

22.0 

ACB 


26 

21.5 

25.5 

20.8 




BAC 

31 

24 

.3 


30.1 BCA 


41 

20.9 

27.5 

24.3 




BAC 

32 

26 

.4 

26.4 

32.3 BCA 


42 

21.9 

28.6 

23.1 




BAC 

33 

19 

.9 

23.7 

25.5 BCA 


43 

22.0 

27.4 





BAC 

34 

23 

.9 

26.8 

30.8 BCA 


44 

23.3 

30.7 

26.6 




BAC 

35 

20 

. 5 

23.2 

26.3 BCA 


45 

18.8 

27.9 

24.6 




BAC 

36 

21 

. 8 

23.6 

BCA 


46 

24.6 

29.8 

26.6 




CAB 

51 

24 

. 0 

21.8 

21.6 CBA 


61 

23.2 

18.9 

23.8 




CAB 

52 

25 

.9 

23.7 

CBA 


62 

23.9 

21.5 

25.4 




CAB 

53 

25 

. 5 


23.4 CBA 


63 

28.0 

25.3 

28.1 




CAB 

54 

27 

.9 

25.4 

24.4 CBA 


64 

24.6 

22.7 

23.8 




CAB 

55 

25 

.3 

26.4 

25.8 CBA 


65 

27.7 

23.5 

25.6 




CAB 

56 

25 

. 7 


24.9 CBA 


66 

21.5 

18.1 

22.8 




DATA TWC 

; SET ONE; 










IF 

SEQ=' 

ABC' 

AND 

PER=1 

THEN TRT='A' 

; 

IF 

SEQ='ABC 

' AND 

PER=1 

THEN 

PRIORTRT='O ' 

; 

IF 

SEQ=' 

ABC' 

AND 

PER=2 

THEN TRT='B' 

; 

IF 

SEQ='ABC 

' AND 

PER=2 

THEN 

PRIORTRT=' A' 

; 

IF 

SEQ=' 

ABC' 

AND 

PER=3 

THEN TRT='C' 

; 

IF 

SEQ='ABC 

' AND 

PER=3 

THEN 

PRIORTRT='B' 

; 

IF 

SEQ=' 

ACB' 

AND 

PER=1 

THEN TRT='A' 

; 

IF 

SEQ='ACB 

' AND 

PER=1 

THEN 

PRIORTRT='O' 

; 

IF 

SEQ=' 

ACB' 

AND 

PER=2 

THEN TRT='C ' 

; 

IF 

SEQ='ACB 

' AND 

PER=2 

THEN 

PRIORTRT=' A' 

; 

IF 

SEQ=' 

ACB' 

AND 

PER=3 

THEN TRT='B' 

; 

IF 

SEQ='ACB 

' AND 

PER=3 

THEN 

PRIORTRT='C ' 

; 

IF 

SEQ=' 

BAC' 

AND 

PER=1 

THEN TRT='B' 

; 

IF 

SEQ='BAC 

' AND 

PER=1 

THEN 

PRIORTRT='O' 

; 

IF 

SEQ=' 

BAC' 

AND 

PER=2 

THEN TRT='A ' 

; 

IF 

SEQ='BAC 

' AND 

PER=2 

THEN 

PRIORTRT='B' 

; 

IF 

SEQ=' 

BAC' 

AND 

PER=3 

THEN TRT='C ' 

; 

IF 

SEQ='BAC 

' AND 

PER=3 

THEN 

PRIORTRT='A' 

; 

IF 

SEQ=' 

BCA' 

AND 

PER=1 

THEN TRT='B' 

; 

IF 

SEQ='BCA 

' AND 

PER=1 

THEN 

PRIORTRT='O ' 

; 

IF 

SEQ=' 

BCA' 

AND 

PER=2 

THEN TRT='C' 

; 

IF 

SEQ='BCA 

' AND 

PER=2 

THEN 

PRIORTRT='B' 

; 

IF 

SEQ=' 

BCA' 

AND 

PER=3 

THEN TRT='A' 

; 

IF 

SEQ='BCA 

' AND 

PER=3 

THEN 

PRIORTRT='C' 

; 

IF 

SEQ=' 

CAB' 

AND 

PER=1 

THEN TRT='C ' 

; 

IF 

SEQ='CAB 

' AND 

PER=1 

THEN 

PRIORTRT='O' 

; 

IF 

SEQ=' 

CAB' 

AND 

PER=2 

THEN TRT='A' 

; 

IF 

SEQ='CAB 

' AND 

PER=2 

THEN 

PRIORTRT='C ' 

; 

IF 

SEQ=' 

CAB' 

AND 

PER=3 

THEN TRT='B' 

; 

IF 

SEQ='CAB 

' AND 

PER=3 

THEN 

PRIORTRT='A' 

; 

IF 

SEQ=' 

CBA' 

AND 

PER=1 

THEN TRT='C' 

; 

IF 

SEQ='CBA 

' AND 

PER=1 

THEN 

PRIORTRT='O' 

; 

IF 

SEQ=' 

CBA' 

AND 

PER=2 

THEN TRT='B' 

; 

IF 

SEQ='CBA 

' AND 

PER=2 

THEN 

PRIORTRT='C ' 

; 

IF 

SEQ=' 

CBA' 

AND 

PER=3 

THEN TRT='A' 

; 

IF 

SEQ='CBA 

' AND 

PER=3 

THEN 

PRIORTRT='B' 

; 


RUN; 


TABLE 29.33 

SAS Code to Analyze the Data in Table 29.31 

PROC MIXED; 

CLASSES SUBJ SEQ PER TRT PRIORTRT; 
MODEL Y=SEQ PER TRT PRIORTRT/DDFM=KR; 
RANDOM SUBJ(SEQ); 

LSMEANS TRT PRIORTRT/PDIFF; 

RUN; 
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TABLE 29.34 


Estimates of the Variance Components 


Covariance Parameter Estimates 


Covariance Parameter 

Estimate 

Subj(Seq) 

3.2278 

Residual 

0.8934 


TABLE 29.35 

Test on the Fixed Effects 
Type III Tests of Fixed Effects 


Effect 

Num df 

Den df 

F-Value 

Pr>F 

Seq 

5 

31 

1.07 

0.3942 

Per 

1 

54.7 

11.93 

0.0011 

Trt 

2 

54.3 

112.07 

<0.0001 

PRIORTRT 

2 

54.6 

3.72 

0.0306 


TABLE 29.36 

Treatment Least Squares Means 
Least Squares Means 


Effect 

Trt 

PRIORTRT 

Estimate Standard Error 

df t- Value Pr > 1 1 \ 

Trt 

A 


Nonestimable — 

- - - 

Trt 

B 


Nonestimable — 

- - - 

Trt 

C 


Nonestimable — 

- - - 

PRIORTRT 


A 

Nonestimable — 

- - — 

PRIORTRT 


B 

Nonestimable — 

— — - 

PRIORTRT 


C 

Nonestimable — 

- - - 

PRIORTRT 


O 

Nonestimable — 

- - - 


Even though the treatment least squares means were nonestimable in Table 29.36, the 
pairwise differences are estimable, and the pairwise comparisons between the treatment 
and carryover parameters are given in Table 29.37. From Table 29.37, one can see that 

Ta-t b = 0.8195, f A - f c - -3.0861, and f B - f c = -3.9056 

The corresponding observed significance levels are p = 0.0041, p < 0.0001, and p < 0.0001. 
Thus all treatments are significantly different from one another. 

Also from Table 29.37, one can see that 

X A -X B = -0.7706, X A -X C = 0.1842, and X B -X C = 0.9548 

The corresponding observed significance levels are p = 0.0557, p = 0.6380, and p = 0.0118. 
Thus the carryover effects from treatments A and C are similar to one another, and both 
are significantly different from the carryover from treatment B. 
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TABLE 29.37 


Pairwise Comparisons among Treatments and Carryover Parameters 


Differences of Least Squares Means 

Effect Trt PRIORTRT Trt 

PRIORTRT 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

Trt 

A 


B 


0.8195 

0.2737 

54.3 

2.99 

0.0041 

Trt 

A 


C 


-3.0861 

0.2738 

54.5 

-11.27 

<0.0001 

Trt 

B 


C 


-3.9056 

0.2753 

54.2 

-14.19 

<0.0001 

PRIORTRT 


A 


B 

-0.7706 

0.3942 

54.8 

-1.95 

0.0557 

PRIORTRT 


A 


C 

0.1842 

0.3893 

54.8 

0.47 

0.6380 

PRIORTRT 


A 


O 

Nonestimable 

— 

— 

— 

— 

PRIORTRT 


B 


C 

0.9548 

0.3665 

54.3 

2.61 

0.0118 

PRIORTRT 


B 


O 

Nonestimable 

— 

— 

— 

— 

PRIORTRT 


C 


O 

Nonestimable 

— 

— 

— 

— 


TABLE 29.38 

Additional SAS Code to Compute Estimable Least Squares Means for Treatments 

ESTIMATE 'LSM A' INTERCEPT 18 SEQ 333333 PER 6 6 6 TRT 18 0 0 
PRIORTRT 444 6/DIVISOR=18; 

ESTIMATE 'LSM B' INTERCEPT 18 SEQ 333333 PER S S S TRT 0 18 0 
PRIORTRT 444 6/DIVISOR=18; 

ESTIMATE 'LSM C' INTERCEPT 18 SEQ 333333 PER S G G TRT 0 0 18 
PRIORTRT 444 6/DIVISOR=18; 

ESTIMATE ’A-B' TRT 1 -1 0; 

ESTIMATE 'A-C' TRT 1 0 -1; 

ESTIMATE 'B-C' TRT 0 1 -1; 

RUN; 


TABLE 29.39 

Results from the Commands Given in Table 29.38 
Estimates 


Label 

Estimate 

Standard Error 

df 

f-Value 

Pr> |f| 

LSM A 

23.7217 

0.3525 

45.3 

67.29 

<0.0001 

LSM B 

22.9022 

0.3511 

44.7 

65.23 

<0.0001 

LSM C 

26.8078 

0.3567 

46.9 

75.16 

<0.0001 

A-B 

0.8195 

0.2737 

54.3 

2.99 

0.0041 

A-C 

-3.0861 

0.2738 

54.5 

-11.27 

<0.0001 

B-C 

-3.9056 

0.2753 

54.2 

-14.19 

<0.0001 

CARRY_A-CARRY_B 

-0.7706 

0.3942 

54.8 

-1.95 

0.0557 

CARRY_A-CARRY_C 

0.1842 

0.3893 

54.8 

0.47 

0.6380 

CARRY_B-CARRY_C 

0.9548 

0.3665 

54.3 

2.61 

0.0118 


Table 29.38 gives additional SAS code that computes treatment least squares means using 
definitions similar to that for treatment A given in (29.16). The results from the SAS 
commands in Table 29.38 are given in Table 29.39. An examination of Table 29.39 reveals 
that treatment C has the largest mean, followed in order by treatments A and B. Also note 
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TABLE 29.40 

Pain Score 


Subject 

Sequence 

Response 1 

Response 2 

1 

AB 

49.6 

54.0 

2 

AB 

45.5 

58.6 

3 

AB 

45.1 

45.1 

4 

AB 

46.5 

40.7 

5 

AB 

26.1 

63.9 

6 

AB 

45.6 

61.6 

7 

AB 

43.0 

46.4 

8 

AB 

51.2 

56.7 

9 

AB 

32.0 

66.7 

10 

AB 

55.3 

33.3 

11 

AB 

43.9 

65.2 

12 

AB 

41.4 

46.6 

21 

BA 

48.7 

49.4 

22 

BA 

40.8 

26.9 

23 

BA 

47.6 

48.6 

24 

BA 

60.5 

30.3 

25 

BA 

38.7 

44.6 

26 

BA 

49.7 

53.1 

27 

BA 

65.5 

39.9 

28 

BA 

51.5 

43.4 

29 

BA 

38.6 

43.1 

30 

BA 

48.8 

30.4 

31 

BA 

39.4 

48.8 

32 

BA 

44.6 

33.9 

33 

BA 

60.1 

60.5 

34 

BA 

44.8 

24.7 

35 

BA 

33.8 

48.1 

36 

BA 

48.2 

46.1 


TABLE 29.41 

Milk Yields 

Cow 

Sequence 

Period 1 

Period 2 

Period 3 

1 

ABC 

38 

25 

15 

2 

BCA 

109 

86 

39 

3 

CAB 

124 

72 

27 

4 

ACB 

86 

76 

46 

5 

BAC 

75 

35 

34 

6 

CBA 

101 

63 

1 



TABLE 29.42 

Data from a 13 Treatment/4 Period Crossover Design 
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that the pairwise comparisons between the treatment means and carryover effects given in 
Table 29.39 are exactly the same as the corresponding comparisons in Table 29.37. 


29.5 Summary 

This chapter considered crossover designs where each subject receives a series of treat¬ 
ments over time. The particular series of treatments that a specific subject receives is called 
a sequence. When using crossover designs one needs to consider the issue of carryover. 
If one suspects that carryover might be present, one can include a washout period prior to 
giving the next treatment in a sequence in order to minimize as much as possible the effects 
of carryover. When carryover exists in a two period/two treatment crossover design, then 
one cannot use the data from the second period to estimate treatment differences. However, 
when one is able to use more than two periods and/or more than two treatments, then 
estimates of treatment differences can be obtained whether carryover exists or not. 


29.6 Exercises 

29.1 An experiment was conducted to study a toothpaste's ability to desensitize 
sensitive teeth. In Table 29.40, A represents a desensitizing toothpaste and B rep¬ 
resents a control toothpaste. Twelve subjects were assigned to the sequence AB 
and another 16 subjects were assigned to the sequence BA. Response 1 corresponds 
to a measure of pain on a 0-100 scale for the toothpaste received in period 1, and 
response 2 corresponds to the toothpaste received in period 2. Answer each of 
the following questions: 

1) Does there appear to be any carryover effects? Why or why not? 

2) Assuming no carryover, find a 95% confidence interval for the difference 
between the desensitizing toothpaste and the control toothpaste. 

3) Assuming carryover exists, find a 95% confidence interval for the difference 
between the desensitizing toothpaste and the control toothpaste. 

29.2 Table 29.41 contains data on milk yields for cows being fed one of three different 
diets over time. The data are taken from Cochran and Cox (1957). The diets are 
denoted by A, B, and C. 

1) Do there appear to be any carryover effects? Why or why not? 

2) Compare the three diets with one another assuming a no carryover model. 

3) Compare the three diets with one another assuming a model with carryover. 

29.3 Table 29.42 contains data from a 13 treatment/four period crossover design. 
There were 13 sequences of diet treatments with each sequence covering four 
periods. One cow was assigned to each sequence. Answer the following questions 
for the dependent variables ACE average and PROP average. 

1) Do there appear to be any carryover effects? Why or why not? 

2) Compare the 13 diets with one another assuming a no carryover model. 

3) Compare the 13 diets with one another assuming a model with carryover. 
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Nested effects can occur in either the design structure or the treatment structure or both 
of a designed experiment. For nesting to occur in the design structure, there must be more 
than one size of experimental unit where a small experimental unit is nested within a larger 
one. Split-plot, repeated measures and hierarchical designs are examples where there is 
nesting in the design structure. Nesting in the treatment structure can occur when there 
are two or more factors. These factors may be all fixed effects, all random effects, or a mix¬ 
ture of both. Thus an experiment with nesting in the treatment structure can be modeled 
as a fixed, random, or mixed model. The concept of nested factors in the design structure 
was introduced in Chapter 5. This chapter presents some examples that demonstrate model 
construction, parameter estimation, and hypothesis testing. Nested designs with nesting 
in the design structure are often referred to as hierarchical designs. 


30.1 Definitions, Assumptions, and Models 

In the treatment structure, the levels of factor B are nested within the levels of factor A if 
each level of B occurs with only one level of factor A. The following examples demonstrate 
nesting in the treatment structure. 

30.1.1 Example 30.1: Companies and Insecticides 

Four chemical companies produce certain insecticides. Company A produces three such 
products, companies B and C produce two such products each, and company D produces 
four such products. No company produces a product exactly like that of another. The treat¬ 
ment structure is two-way with company as one factor and product as the other. Such a 
treatment structure is shown in Table 30.1, where each level of product occurs only once 
within each level of company. Thus the levels of product are nested within the levels 
of company. 
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TABLE 30.1 


Treatment Structure for the Companies and Insecticide Example 


Company 





Products 





1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

A 

X 

X 

X 









B 




X 

X 







C 






X 

X 





D 








X 

X 

X 

X 


Note: An "X" denotes that particular product is from the corresponding company. 


The levels of both factors in the treatment structure are fixed effects. To conduct the 
experiment, a box of soil containing live bluegrass plants and 400 mosquitoes were put 
into each of 33 glass containers. Three glass containers were then randomly assigned to 
each product. The glass containers were treated with the product, and after 4 h the number 
of live mosquitoes was counted. 

The design structure for this experiment is completely randomized. A model that can be 
used to describe the number of live mosquitoes from each container is 

y,j k = p + Yi + P m + £ ijk , i = 1,2,3,4, j = 2,3, or 4, k= 1,2,3 (30.1) 

where y ijk is the observed number of mosquitoes from the /cth replication of the/th product 
of the zth company, p is the overall mean, y is the effect of the zth company, p- (i) is the effect 
of the jth product in company i, and e ijk ~ i.i.d. N( 0, of) denotes the error associated with 
measuring y ijk . The model has only one size of experimental unit and thus only one error 
term. The design of the experiment is a two-way nested treatment structure (both factors 
are fixed effects) in a completely randomized design structure. The parameters of the 
model to be estimated are o\ and estimable functions of ,u, y, and p /( , r The data for this 
example are in Table 30.2, and the analysis is discussed in Sections 30.2 and 30.3. 


TABLE 30.2 


Data for Example 30.1 


Products 


Company 

1 

2 

3 

A 

151 

118 

131 


135 

132 

137 


137 

135 

121 


B 


C 


140 151 

152 132 

133 139 

96 84 

108 87 

94 82 


D 

79 

67 

90 

82 


74 

78 

81 

89 


73 

63 

96 

98 
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30.1.2 Example 30.2: Comfort Experiment Revisited 

The comfort experiment in Example 5.10 is an example of nesting in the design structure. 
The large experimental unit is an environmental chamber, and the small experimental 
unit is a person. The model is 

Vijkm = Hik + c m + p m (ijk) i = 65, 70, 75 , ; = 1 , 2 ,..., 9 , k = M, F, m = 1,2 (30.2) 

where p ik denotes the mean of the fth temperature and the /cth gender, c ;(;) denotes the 
random effect of the /1 h chamber assigned to the zth temperature where it is assumed that 
c j(i) ~ N(0, ^chamber) anc * Pm(ijk) denotes the random effect of the ruth person of the /cth gender 
assigned to the /th chamber of the zth temperature where it is assumed that p m( , lk) ~ 
N( 0, (Tp crson ). The terms c ;(i) and p m( ,, k) denote the errors of observing temperature i on cham¬ 
ber; (chambers are nested within temperature) and person m of the /cth gender within the 
/'th chamber of the zth temperature (persons are nested within chambers). Since there are 
two sizes of experimental units, the analysis has two levels and two error terms. The para¬ 
meters that need to be estimated are p ik , i = 65, 70, 75, k = M, F, < 7 ^ hamher , and < 7 p erson - The 
analysis of this example is discussed in Sections 30.2 and 30.3. 


30.1.3 Example 30.3: Coffee Price Example Revisited 

The coffee price example in Chapter 18 is a design of a sample survey with a three-way 
treatment structure, where the levels of store are nested within the levels of city, which are 
nested within the levels of state (all three factors are random effects). The model used to 
describe the variability in the coffee prices in Chapter 18 was 

yijk It + Sj + i 1,2,... , z", j 1,2,... ,ti/ k 1,2,..., no¬ 
where p denotes the average price of coffee in the United States, s, - i.i.d. N( 0, o/ tatc ), 

c ;(l) - i.i.d. N( 0, (? 2 City ), and a k ^ ~ i.i.d. N( 0, (7g tore ) denotes the random state, city and store 
effects, respectively. The parameters of interest are the variance components <7g tate , oy atv , 
<7g tore , and the overall mean, p. The coffee price example is an example of a multistage 
sampling experiment, a very common application of nested designs. The analysis of this 
example will be discussed in the next two sections. Table 30.3 contains coffee prices for a 
small study that is used to demonstrate parameter estimation, confidence interval estima¬ 
tion, and hypothesis testing. 

The above examples demonstrate the nesting that can occur in the treatment structure, 
design structure, or both. The treatment structure can involve random and/or fixed effects, 
and the design structure can involve several sizes of experimental units. Thus, the analysis 
of such designs involves using the techniques of fixed effects models, random effects models, 
and mixed effects models as discussed in the next two sections. 


30.2 Parameter Estimation 

A model involving nesting falls into one of the classes of models already discussed. That 
is, if there are fixed effects, the means need to be estimated; if there are random effects. 
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TABLE 30.3 

Coffee Prices in U.S. Cents per Pound for Example 30.3 


State 

City 

Store 1 

Store 2 

Store 3 

Store 4 

Store 5 

Store 6 

1 

1 

158 

146 

158 

— 

— 

— 

1 

2 

140 

142 

144 

140 

152 

— 

1 

3 

156 

162 

148 

156 

150 

164 

2 

1 

162 

162 

166 

162 

166 

170 

2 

2 

166 

156 

168 

162 

162 

168 

3 

1 

142 

150 

154 

156 

— 

— 

3 

2 

128 

132 

132 

134 

126 

132 

3 

3 

164 

164 

146 

164 

162 

— 

3 

4 

140 

146 

142 

138 

146 

142 

4 

1 

148 

138 

144 

154 

150 

146 

4 

2 

140 

138 

150 

136 

148 

— 

5 

1 

156 

152 

144 

148 

152 

— 

5 

2 

124 

148 

140 

136 

— 

— 

5 

3 

134 

148 

144 

144 

142 

— 

5 

4 

148 

148 

148 

158 

— 



variance components need to be estimated; and if there are several sizes of experimental 
units, the analysis involves more than one error term. Thus the design may be quite simple 
or quite complex. The examples in this chapter are used to demonstrate the application of 
the previously discussed techniques to the analysis of designs involving nesting. 


30.2.1 Example 30.1; Continuation 

The estimable functions of model (30.1) involve linear combinations of p + y + p- (i) . For 
example, contrasts of the p ;(i) for each i are estimable where p 1(i) - p 2(0 is a contrast of the 
products within the same company where in this case the contrast compares products 1 
and 2 from the ith company. Contrasts of p + y + p. (i) are used to compare companies 
averaged across each companies' products where y, + p. (1) - y 2 - p. (2) compares company 1 
to company 2. The contrast y x + p 1(1) - y 2 - p 1(2) compares product 1 of company 1 to product 1 
of company 2. The estimates of these contrasts are 


Pl(0 p2(i) - Vil- Vi2- 


7l + P-( 1) 72 P-(2) - V\- V2~ 


and 


Yi + P i(i) 72 Pi( 2 ) — Vn- Ir¬ 


respectively. 

Model (30.1) can also be expressed as 


y ijk = p m + i = 1, 2, 3,4, j = 2, 3, or 4, k = 1, 2, 3 
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TABLE 30.4 


Estimates of the Means for Example 30.1 


Company A or 1 

Company B or 2 

Company C or 3 

Company D or 4 


/hd) fha) fhm 

Bid) 


Ml (3) 

A*2<3> 

Mi(4> Bm) 

M3(4) 


141.0 128.3 129.7 

141.7 

140.7 

99.3 

84.3 

75.3 69.3 

89.0 

89.7 


where jd !(l) is the mean of product j of company i. The estimators of the are the y iy Any 
linear combination including contrasts of the ^ (l) is estimable, but one has to be careful in 
interpreting selected contrasts. The estimator of a\ comes from pooling the variances of 
the observations within the j(i) combinations as 

-i m Pi n j(i) 

(j , =1 j—i k=1 

where there are n )(l) observations in the j(i) cell, there are q cells with n ;(i) > 0, m is the 
number of companies, p, is the number of products from the zth company, n, {1) is the number 
of glass containers assigned to the /1h product of the zth company and N = 2^,,%) = 
total number of observations. 

For this example. 


6 : = 


4 ft 


33-11 


XXX ( y #-^.) 2 = 60 - 818 


!=1 ;=1 k =1 


and the estimates of the sample means are in Table 30.4. 

Multiple comparisons can be made between the Other hypotheses can be tested 
about the y l(i) such as comparing companies or products within a company. For example, 
one could compare company B to company D by considering 

77 _ it n - ^ 1(2) + - 2(2> anH 77 - ^ lf4 > + + + ^ 4(4) 

h.(2) M.(4) wnere and 

that is, the comparison would be between the mean of the two products from company B 
and the mean of the four products from company D. The researcher will need to decide 
whether such a comparison would be of interest. Flypothesis testing for this nested treat¬ 
ment structure is discussed in Section 30.3. 

If there are unequal numbers of observations in the j(i) cells, then the techniques for 
analyzing unbalanced models can be used to obtain estimates of the y j(i y The estimates 
of the population marginal means provide estimates of the fi J(i) and the estimate of 6v is 
obtained from pooling the variances across the treatment combinations which can be 
obtained from an analysis of variance. 


30.2.2 Example 30.2: Continuation 

Data for the comfort study are given in Table 30.5. The nesting occurs in the design structure 
with person nested within chamber and chamber nested within temperature. The analysis 
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TABLE 30.5 

Data for Example 30.2 Where Values Are Comfort Scores, Where 1 = Cold, 
8 = Comfortable, and 15 = Hot 


Temperature 

Chamber 1 


Chamber 2 

Chamber 3 

65 

Male 5 

4 

5 4 

4 2 


Female 1 

2 

5 5 

1 3 


Chamber 4 


Chamber 5 

Chamber 6 

70 

Male 8 

8 

6 3 

5 7 


Female 10 

7 

8 8 

8 8 


Chamber 7 


Chamber 8 

Chamber 9 

75 

Male 12 

8 

8 7 

6 6 


Female 11 

13 

8 8 

6 7 


TABLE 30.6 


Analysis of Variance Table for Comfort Study of Example 30.2 


Source 

df 

MS 

EMS 

Temperature 

2 

79.19 

ff Lson + 4c 4amber + V* (Temp) 

Gender 

1 

3.36 

Person + <P 2 (Gender) 

Temperature x Gender 

2 

7.86 

^Person + (^ em P x Gender) 

Chamber(Temperature) 

6 

11.08 

^Person ^ ^Chamber 

Person(Chamber) 

24 

1.65 

^Person 


of variance table with expected mean squares is shown in Table 30.6. The method-of- 
moments estimates of the two components of variance (which are also REML and MINQUE0 
estimates) are (7p erson = 1.65 and (7^ hamber = 2.36 and the maximum likelihood estimates are 
^Person = 1-47 and (7(- hamber = 1.48. The methods used for split-plot experiments (described in 
Chapter 24) can be used to compare the p ir If the design is unbalanced, then a mixed model 
analysis using REML to estimate the variance components would be appropriate. 


30.2.3 Example 30.3: Continuation 

The coffee price study involves a random effects model whose parameters of interest are 
p, (7g tate , (Tqj , and <4 ore . A method of maximum likelihood would provide an analysis and 
estimators of the parameters, as would the MINQUE0 technique. A method-of-moments 
analysis employing type I sums of squares has typically been used for this type of multi¬ 
stage sampling design. The results of the REML method are preferred. Table 30.7 contains 
the type I sums of squares, their expected mean squares, and the method of moment esti¬ 
mators for a general study. Table 30.8 contain the type I analysis for the data in Table 30.3. 
Estimates of the variance components using method-of-moments from type I sums of 
squares, REML, MIVQUE0, and ML are in Table 30.9. The 95% confidence intervals for the 
variance components were computed using the chi-square distribution with the stated 
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TABLE 30.7 


Analysis of Variance Table with Type I Sums of Squares and Method of Moments 
Estimators for Example 30.3 


Source 

df 

ss 

EMS 

State 

r -1 

V —2 —2 

L'h.y,.. - n ..y... 

i=i 

0 Cli Cl-i 9 VI.. t ?9 9 

+ 1 1 <T Z + z rT 

w Store r — 1 City r — 1 State 

City(State) 

i(t-i) 

i=i 

X2>#?- 

i =i ;=i '=i 

9 H.. £7i 9 

<7l + —r-(TT... 

Store t. — T City 

Store(City State) 

L 2 >,-b 

i=l ;=1 

XXXy?* - XX"#?. 

i=l ;=1 k =1 i=l ;=1 

o\ 

Store 


Source: Searle (1987). 


«i = 





i=i pi ' 


MSSTORE(City , State) 
n„ — t 

MSSTATE - af Ison - [fo -a 3 )/(r - l)]c 

( n..-a 2 )/(r ~ 1 ) 


MSCITY(STATE) - <r;, 


(«..- )/(*.- r) 


City 


TABLE 30.8 


Type I Sums of Squares, Mean Squares, Expected Mean Squares and Test Statistics for the Coffee 
Price Data 


Source 

df 

SS 

MS 

EMS 

Error Term 

Error df F- Value 

PrF 

State 

4 

3658.67 

914.67 

Var (Residual) + 5.3113 
Var[City(State)] + 
14.967 Var(State) 

1.0726 MS[City(State)]- 
0.0726 MSfResidual) 

9.9 2.40 

0.1204 

City(State) 

10 

3579.02 

357.90 

Var(Residual) + 4.9518 
Var[Czfy(Stflte)] 

MS(Residual ) 

61.0 11.58 

< 0.0001 

Residual 

61 

1885.47 

30.91 

Var{Residual) 

— 

— — 

— 


number of degrees of freedom. Wald's method was used for the intervals provided by the 
method of moments, but they were recomputed using the chi-square distribution with 
degrees of freedom computed as 2(Z-value) 2 . The estimates of the mean coffee price from 
each of the methods of estimating the variance components are given in Table 30.10. 


30.3 Testing Hypotheses and Confidence Interval Construction 

For balanced designs with nesting in one or both of the structures, the methods for balanced 
random effects models, balanced mixed effects models, and balanced split-plot-repeated 
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TABLE 30.9 


Estimates of the Variance Components for the Coffee Data Using Four Methods 


Method 

Covariance Parameter 

Estimate 

Standard Error 

4/ 

Lower 

Upper 

Type I 

State 

35.6 

56.8 

0.8 

6.4 

225322.5 

Type I 

City(State) 

66.0 

33.6 

7.7 

29.8 

249.7 

Type I 

Residual 

30.9 

5.6 

61 

22.3 

45.6 

REML 

State 

28.5 

47.3 

0.7 

4.9 

361333.6 

REML 

City(State) 

67.2 

34.7 

7.5 

30.0 

261.2 

REML 

Residual 

30.9 

5.6 

61 

22.3 

45.6 

ML 

State 

12.3 

37.2 

0.2 

1.3 

1.169 xlO 15 

ML 

City(State) 

70.8 

39.3 

6.5 

30.2 

315.9 

ML 

Residual 

30.9 

5.6 

61 

22.3 

45.5 

MINQUEO 

State 

21.3 

58.2 

0.3 

2.4 

4.545 x 10 12 

MINQUEO 

City(State) 

88.3 

60.0 

4.3 

32.7 

645.5 

MINQUEO 

Residual 

21.3 

3.9 

60 

15.4 

31.6 


TABLE 30.10 

Estimate of the Mean Coffee Price Using Four Methods 
of Estimating the Variance Components 

Method 

Estimate 

Standard Error 

df 

Type I 

149.7 

3.4940 

4 

REML 

149.6 

3.2935 

4 

ML 

149.3 

2.7822 

4 

MINQUEO 

149.4 

3.2649 

4 


measures models can be used to construct confidence intervals and make comparisons 
between parameters. For unbalanced nested designs, the REML method of estimating the 
variance components provides a good method to construct confidence intervals using 
the Satterthwaite approximation. In this case, the approximate degrees of freedom are 
computed as 2(Z-value) 2 . Confidence intervals constructed using type 1 SS estimates, 
REML estimates, ML estimates, and MINQUEO estimates are displayed in Table 30.9 for 
the coffee price data. The method of moments can be effectively used to provide tests of 
hypotheses concerning the variance components. Table 30.8 uses the expected means 
squares to test H 0 : Cg tate = 0 vs H a : Cg tate > 0 and H 0 : a 2 City = 0 vs H a : a 2 c > 0 providing F-tests 
of 2.40 and 11.58, respectively. 

When the nesting is in the treatment structure and involves fixed effect factors, the 
methods of analyzing fixed effects models are used. The major change is in the types of 
hypotheses that can be tested with the analysis of variance table as demonstrated with the 
data of Example 30.1. 

30.3.1 Example 30.1: Continuation 

There are two ways to write the model for the insecticide data 


ltijk = %) + % i = 1 2,3,4, j = 2,3, or 4, k = 1,2,3 


(30.3) 
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TABLE 30.11 

Analysis of Variance Table for Example 30.1 Based on Model (30.3) 


Source 

df 

ss 

MS 

EMS 

F- Value PrT 

Product(Company) 

10 

24201.0 

2420.1 

Var (Residual) + Q[Product(Company)] 

40.09 <0.0001 

Residual 

22 

1328.00 

60.36 

Var (Residual) 

— — 

4 m, 

SSPRODUCT = XE 
i =i ;=i 

n m v\-n.yl. 

4 ntj n j(i ) 

and SSERROR = J j J j J j ( Vtjk - y,.) 2 = 22d 2 

(=1 7=1 k=l 



or 


Vijk = P + Y,+ Pm + % i = 1 2,3,4, j = 2,3, or 4, k= 1, 2, 3 (30.4) 

Table 30.11 contains the analysis of variance table for model (30.3). The hypothesis being 
tested by the sum of squares due to products is 

Pl(l) = Pl( 1) = Ab(l) = Pl(2) = p2(2) = Pl(3) = P-2(3) = Pm = P2(i) = ^3(4) = Pm 

Table 30.12 contains the analysis of variance table for model (30.4). The sum of squares due 
to products in Table 30.11 has been partitioned into the sum of squares due to company and 
the sum of squares due to products nested within companies. Since the two-way treatment 
structure is nested (products are nested within companies), there is no measure for inter¬ 
action between the levels of product and the levels of company. The sums of squares due to 
companies tests the hypothesis ju. m = q. (2) = jl. (3) = jl <4) where /Z. (i) = (1/m,) %) = P + Y, + p.yy 

The sum of squares due to products within companies tests the hypothesis that 

Pl(l) = P2( 1 ) = P3(iy P\(2) — P2(2) = Pl(3) = P2(3)' an ^ 1^1(4) = P2(i) ~ Ps(i) = Pi( 4) 

in terms of the means model parameters, or 

Pl(l) = p2(l) = p3(iy Pl( 2 ) = P 2 ( 2 ) = Pl(3) = p2(3)’ an ^ Pl(4) = p2{4) = p3(4) = Pm 

in terms of the effects model parameters. 


TABLE 30.12 

Analysis of Variance Table for Example 30.1 Based on Model (30.4) 


Source df SS 

MS 

EMS 

F-Value 

Pr F 

Company 3 22649.6 

7549.9 

Var (Residual) + Q[Company,Product(Company )] 

125.07 

<0.0001 

Product(Company) 7 1551.33 

221.62 

Var (Residual) + Q[Product(Company)] 

3.67 

0.0089 

Residual 22 1328.00 

60.36 

Var (Residual) 

— 

— 

4 

SSCOMPANY = Y, n. m yf..-n.yl. 

1=1 

SSPRODUCT {Company) = “ E'ho vl- 

i=l 7=1 «=1 



sserror = EEE(y^-y,,) 2 = 22 ^ 2 




i=l ;=1 k= 1 
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Any of the appropriate multiple comparisons procedures (see Chapter 3) can be used 
where one may want to 1) compare the p. (j) , 2) compare the p, (l) , 3) compare the p j{l) for 
each i, or 4) compare any combination of the above. For unbalanced models, the type III 
or IV sums of squares are appropriate for testing the various hypotheses and the estimates 
of the population marginal means provide estimates of the p. ti) and of the ,u ((i) . 

For other nested models, the expected mean squares in the analysis of variance table can 
be used as guidelines for constructing proper F tests about the variance components and 
the mixed model "type III tests for fixed effects" can be used to test various hypotheses 
about the fixed effects. The methods in Chapter 22 and 23 are to be used in this context. 


30.4 Analysis Using JMP 

JMP® software is used to provide analyses for Examples 30.2 and 30.3. Figure 30.1 contains 
a partial listing of the data set where chamber, temperature, gender, and person are 
declared to be nominal variables. The fit model screen in Figure 30.2 includes temperature, 
gender and temperature x gender as fixed effects and chamber(temperature) as a random effect. 
The person error term is the residual of the model. Figure 30.3 contains the REML estimates 


ift example_30_2 

▼ example_30_2 

< ® 

i^ri 

chamber 

temperature 

gender 

person 

score 


10 

3 

65 

Male 

2 

2 


ii 

3 

65 

Female 

1 

1 


12 

3 

65 

Female 

2 



13 

4 _ .. 

70 

Male 

1 



14 


70 

Male 

2 

8 


15 


70 

Female 

1 

10 


▼ Columns (5/0) 

||, chamber 
|L temperature 
|L gender 
|L person 

A score 

16 


70 

Female 

2 

7 


17 

5 

70 


1 

6 


18 

5 

70 


2 

3 


19 

5 



1 

8 


20 


70 

Female 

2 

8 


21 

6 

70 

Male 

1 

5 


22 

6 

70 

Male 

2 

7 


23 


70 

Female 

1 

8 


24 


70 

Female 

2 

8 


25 

7 

75 

Male 

1 

12 


26 

7 

75 


2 

8 


27 

7 

75 


1 

11 


28 

7 

75 


2 



▼ Rows 

All rows 36 

29 

8 

75 

Male 

1 

8 


30 

8 

75 

Male 

2 

7 



FIGURE 30.1 JMP screen of comfort data for Example 30.2. 
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0 Fit Model 


▼ Model Specification 

Select Columns 

dL chamber 
|L temperature 
da gender 
da person 
A score 


Pick Role Variables 

Y 1 ^ score 

optional 


Weight [ | optional Numeric 


Freq I | optional Numeric 


| By ] | optional 


EMix 


Personality: standard Least Squares v 
Emphasis: Effect Leverage v 

Method: R EML (Recommended) v 

0 Unbounded Variance Components 
n Estimate Only Variance Components 


Run Model 


Construct Model Effects 


temperature 

chamber[temperature]& Random 
gender 

temperature’gender 


Degree [~~2] 
Attributes ▼ 
Transform ▼ 

I I No Intercept 


FIGURE 30.2 JMP fit model screen for comfort data of Example 30.2. 


▼ REML Variance Component Estimates 

Random Effect Var Ratio 

Var Component Std Error 

95% Lower 

95% Upper 

Pet of Total 

chamber[temperature] 1.4264706 

2.3576389 1.604182 

-0.786558 

5.5018356 

58.788 

Residual 


1.6527778 0.4771158 

1.0076869 

3.198628 

41.212 

Total 


4.0104167 



100.000 

-2 LogLikelihood = 130.93183289 





► Iterations 






^ Fixed Effect Tests 






Source Nparm 

DF 

DFDen F Ratio Prob > F 




temperature 2 

2 

6 7.1454 0.0259’ 




gender 1 

1 

24 2.0336 0.1667 




temperature’gender 2 

2 

24 4.7563 0.0182* 





FIGURE 30.3 JMP AOV table for random and fixed effects for Example 30.2. 


of the variance components with a Wald confidence interval for the chamber error and a 
chi-square confidence interval for the residual or person variation and the type III tests 
for the fixed effects. The least squares means with Tukey multiple comparisons are dis¬ 
played in Figure 30.4. The data screen in Figure 30.5 contains a partial listing of the coffee 
prices for Example 30.3. The fit model screen in Figure 30.6 has state and city(state) as 
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Level 

Least Sq Mean 

75,Female A 

8.8333333 

70,Female A B 

8.1666667 

75 .Male A B 

7.8333333 

70 .Male A B 

6.1666667 

65 .Male A B 

4.0000000 

65,Female B 

2.8333333 

Levels not connected by same letter are significantly different. 


FIGURE 30.4 Least squares means from JMP with Tukey multiple comparison for Example 30.2. 


ft example_30_3_coffee 


s example_30_3_coffee 

4 ..... © 


city 

store 

price 


i 

1 

1 

1 

158 


2 

1 

1 

2 

146 


3 

1 

1 

3 

158 


4 

1 

1 

4 

■ 


r Columns (4/0) 
d. state 
iL city 
ri. store 
^ price 

is Rows 

All rows 90 

5 

1 

1 

5 

■ 


6 

1 

1 

6 

■ 


7 

1 

2 

1 

140 


8 

1 

2 

2 

142 


9 

1 

2 

3 

144 


10 

1 

2 

4 

140 


11 

1 

2 

5 

152 


12 

1 

2 

6 

■ 


13 

1 

3 

1 

156 


14 

1 

3 

2 

162 


15 

1 

3 

3 

148 


16 

1 

3 

4 

156 


17 

1 

3 

5 

150 


18 

1 

3 

6 

164 


19 

2 

1 

1 

162 


20 

2 

1 

2 

162 


21 

2 

1 

3 

166 


22 

2 

1 

4 

162 


23 

2 

1 

5 

166 



FIGURE 30.5 JMP data screen for coffee prices for Example 30.3. 

random effects. The store to store variation is measured by the residual. Figure 30.7 con¬ 
tains the estimates of the variance components as well as the estimate of the overall mean. 
Wald confidence intervals are provided for the state and city(state) variance components 
and a chi-square confidence interval is provided for the residual or store to store variation. 
The estimate of the standard error of the overall mean is 3.359 while the estimate of the 
standard error of the overall mean from SAS®-Mixed using REML is 3.2935 (see Table 30.10). 
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0 Fit Model 


▼ " Model Specification 

Select Columns 

Estate 
4 city 
||. store 
A price 


Pick Role Variables 



A price 

optional 

| Weight) 

| optional Numeric 

f Freq | 

| optional Numeric 

By 

| optional 


Personality: standard Lea 
Emphasis: 1 Effect Levera 


0 Unbounded Variance Components 
I I Estimate Only Variance Components 


Run Model 


I Help | 

I Remove I 


Construct Model Effects 

f . . 1 : s-.* 


\ _Add_I 

1 CTO* | 

Nest 

|Macros _ vj 

Degree [~2l 
Attributes t 
T ransform r 
I I No Intercept 


i states Random 
jcity[state]& Random 


FIGURE 30.6 JMP fit model screen for Example 30.3. 


▼ Parameter Estimates 






Term Estimate Std Error DFDen 

Intercept 149.59566 3.359306 2.934 

► Random Effect Predictions 

t Ratio Prob>|t| 
44.53 < 0001* 

^ REML Variance Component Estimates 

Random Effect Var Ratio 

Var Component 

Std Error 95% Lower 

95% Upper 

Pet of Total 

state 0.9253692 

28.566337 

47.346341 

■64.23249 

121.36516 

22.561 

city[state] 2.1762524 

67.181359 

34.706183 

-0.84276 

135.20548 

53.058 

Residual 

30.870206 

5.582685 22.299592 

45.567205 

24.381 

Total 

126.6179 




100.000 


FIGURE 30.7 REML estimates of the variance components for Example 30.3 from JMP. 
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30.5 Concluding Remarks 

The concept of nesting was defined in this chapter and examples were used to demonstrate 
that nesting can occur in the treatment structure, the design structure or both. Nested 
models can involve fixed effects, random effects or mixed effects models, and analyses of 
the examples demonstrate some of the issues involved. 


30.6 Exercises 

30.1 Types of salt: A food scientist wanted to determine if the type and source of salt 
when mixed into instant mashed potatoes affect flavor profiles. Three types of 
salt, kosher, sea and table salt, were selected with four sources of sea salt (four 
different parts of the world, denoted by brands 1, 2, 3, and 4), three brands of 
kosher salt (brands 5, 6, and 7), and two brands of table salt (brands 7 and 8). 
Thus the study involves nine brands of salt, but the brands come from three dif¬ 
ferent types. On a given day, a taste panel session was held where six of the salts 
were evaluated for salt intensity by five trained taste panelists. The median of the 
five panel members' evaluations (0-100) was used as the response for each salt 
tasted. The following table contains the session, brand (b t ) and the corresponding 
response (r ( ). The order the panel tasted the products in was randomized within 
each session. Construct a model to describe this data, compare the salt type means 
and compare the brand means. Estimate the variance components and provide 
confidence intervals. Carry out a multiple comparison method to compare the 
type means (sea, kosher, and table) and the brand means within a type. 


Data for Exercise 30.1 


Session 

bl 

rl 

b2 

r2 

b3 

r3 

b4 

r4 

b5 

r5 

b6 

r6 

1 

1 

54 

2 

57 

4 

68 

5 

49 

7 

49 

8 

46 

2 

2 

67 

3 

73 

5 

57 

6 

62 

8 

55 

9 

66 

3 

1 

62 

3 

66 

4 

64 

6 

60 

7 

53 

9 

58 

4 

1 

72 

2 

63 

5 

57 

6 

57 

7 

60 

9 

60 

5 

1 

63 

3 

60 

4 

79 

5 

58 

8 

61 

9 

66 

6 

2 

72 

3 

67 

4 

66 

6 

49 

7 

52 

8 

50 

7 

1 

61 

3 

71 

5 

59 

6 

64 

7 

66 

8 

67 

8 

1 

60 

2 

64 

4 

69 

6 

57 

8 

41 

9 

64 

9 

2 

59 

3 

66 

4 

59 

5 

49 

7 

53 

9 

47 

10 

4 

73 

5 

64 

6 

65 

7 

59 

8 

62 

9 

63 

11 

1 

56 

2 

52 

3 

55 

4 

62 

5 

54 

6 

56 

12 

1 

66 

2 

72 

3 

72 

7 

71 

8 

59 

9 

60 


30.2 Coffee prices and types of stores: The coffee example of Exercise 30.3 was modified: 
within each city of a state, three types of stores were included in the sampling. 
A random sample of each of three types of stores (large chain store, locally owned 
store, and convenience store) was selected from the stores of that type within 
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each city. Construct a model to describe the following data, and obtain estimates 
of the store type means as well as the variance components. Construct confidence 
intervals about the variance components and carry out a multiple comparison 
among the store type means. 


Data for Exercise 30.2 


State 

City 

Type 

Store 1 

Store 2 

Store 3 

Store 4 

Store 5 

Store 6 

1 

1 

Chain 

142 

128 

144 

144 

126 

— 

1 

1 

Convenience 

152 

154 

134 

144 

140 

150 

1 

1 

Local 

154 

128 

150 

— 

— 

— 

1 

2 

Chain 

136 

100 

126 

— 

— 

— 

1 

2 

Convenience 

124 

134 

132 

136 

124 

116 

1 

2 

Local 

140 

134 

140 

154 

— 

— 

1 

3 

Chain 

146 

152 

138 

130 

154 

— 

1 

3 

Convenience 

148 

160 

150 

152 

— 

— 

1 

3 

Local 

142 

154 

138 

134 

— 

— 

2 

1 

Chain 

162 

150 

146 

146 

— 

— 

2 

1 

Convenience 

162 

168 

148 

158 

— 

— 

2 

1 

Local 

146 

146 

156 

158 

164 

152 

2 

2 

Chain 

160 

158 

156 

154 

160 

— 

2 

2 

Convenience 

162 

182 

170 

166 

— 

— 

2 

2 

Local 

154 

158 

180 

172 

— 

— 

2 

3 

Chain 

164 

174 

180 

144 

166 

136 

2 

3 

Convenience 

164 

162 

166 

— 

— 

— 

2 

3 

Local 

162 

190 

162 

156 

172 

168 

2 

4 

Chain 

126 

140 

122 

126 

142 

124 

2 

4 

Convenience 

148 

144 

148 

— 

— 

— 

2 

4 

Local 

126 

128 

124 

116 

— 

— 

2 

5 

Chain 

148 

134 

140 

146 

134 

124 

2 

5 

Convenience 

132 

130 

134 

170 

— 

— 

2 

5 

Local 

136 

130 

118 

172 

148 

— 

3 

1 

Chain 

110 

120 

118 

134 

116 

— 

3 

1 

Convenience 

124 

138 

114 

134 

126 

— 

3 

1 

Local 

118 

120 

118 

124 

116 

— 

3 

2 

Chain 

130 

154 

144 

156 

152 

— 

3 

2 

Convenience 

166 

132 

166 

160 

— 

— 

3 

2 

Local 

148 

156 

144 

150 

— 

— 

3 

3 

Chain 

120 

130 

136 

126 

— 

— 

3 

3 

Convenience 

152 

134 

136 

136 

144 

— 

3 

3 

Local 

132 

148 

126 

134 

134 

— 

4 

1 

Chain 

146 

170 

154 

150 

— 

— 

4 

1 

Convenience 

154 

148 

148 

142 

— 

— 

4 

1 

Local 

146 

146 

140 

158 

132 

— 

4 

2 

Chain 

126 

154 

130 

142 

126 

120 

4 

2 

Convenience 

130 

138 

144 

138 

140 

— 

4 

2 

Local 

138 

126 

132 

— 

— 

— 
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TABLE A.1 

Percentage Points of the Maximum F-Ratio 


Upper 5% Points 


> 

\ 2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

2 

39.0 

87.5 

142 

202 

266 

333 

403 

475 

550 

626 

704 

3 

15.4 

27.8 

39.2 

50.7 

62.0 

72.9 

83.5 

93.9 

104 

114 

124 

4 

9.60 

15.5 

20.6 

25.2 

29.5 

33.6 

37.5 

41.1 

44.6 

48.0 

51.4 

5 

7.15 

10.8 

13.7 

16.3 

18.7 

20.8 

22.9 

24.7 

26.5 

28.2 

29.9 

6 

5.82 

8.38 

10.4 

12.1 

13.7 

15.0 

16.3 

17.5 

18.6 

19.7 

20.7 

7 

4.99 

6.94 

8.44 

9.70 

10.8 

11.8 

12.7 

13.5 

14.3 

15.1 

15.8 

8 

4.43 

6.00 

7.18 

8.12 

9.03 

9.78 

10.5 

11.1 

11.7 

12.2 

12.7 

9 

4.03 

5.34 

6.31 

7.11 

7.80 

8.41 

8.95 

9.45 

9.91 

10.3 

10.7 

10 

3.72 

4.85 

5.67 

6.34 

6.92 

7.42 

7.87 

8.28 

8.66 

9.01 

9.34 

12 

3.28 

4.16 

4.79 

5.30 

5.72 

6.09 

6.42 

6.72 

7.00 

7.25 

7.48 

15 

2.86 

3.54 

4.01 

4.37 

4.68 

4.95 

5.19 

5.40 

5.59 

5.77 

5.93 

20 

2.46 

2.95 

3.29 

3.54 

3.76 

3.94 

4.10 

4.24 

4.37 

4.49 

4.59 

30 

2.07 

2.40 

2.61 

2.78 

2.91 

3.02 

3.12 

3.21 

3.29 

3.36 

3.39 

60 

1.67 

1.85 

1.96 

2.04 

2.11 

2.17 

2.22 

2.26 

2.30 

2.33 

2.36 

oo 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 






Upper 1% Points 






2 

199 

448 

729 

1036 

1362 

1705 

2063 

2432 

2813 

3204 

3605 

3 

47.5 

85 

120 

151 

184 

21(6) 

24(9) 

28(1) 

31(0) 

33(7) 

36(1) 

4 

23.2 

37 

49 

59 

69 

79 

89 

97 

106 

113 

120 

5 

14.9 

22 

28 

33 

38 

42 

46 

50 

54 

57 

60 

6 

11.1 

15.5 

19.1 

22 

25 

27 

30 

32 

34 

36 

37 

7 

8.89 

12.1 

14.5 

16.5 

18.4 

20 

22 

23 

24 

26 

27 


Continued 
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TABLE A.1 (continued) 


Upper 1% Points 

V 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

8 

7.50 

9.9 

11.7 

13.2 

14.5 

15.8 

16.9 

17.9 

18.9 

19.8 

21 

9 

6.54 

8.5 

9.9 

11.1 

12.1 

13.1 

13.9 

14.7 

15.3 

16.0 

16.6 

10 

5.85 

7.4 

8.6 

9.6 

10.4 

11.1 

11.8 

12.4 

12.9 

13.4 

13.9 

12 

4.91 

6.1 

6.9 

7.6 

8.2 

8.7 

9.1 

9.5 

9.9 

10.2 

10.6 

15 

4.07 

4.9 

5.5 

6.0 

6.4 

6.7 

7.1 

7.3 

7.5 

7.8 

8.0 

20 

3.32 

3.8 

4.3 

4.6 

4.9 

5.1 

5.3 

5.5 

5.6 

5.8 

5.9 

30 

2.63 

3.0 

3.3 

3.4 

3.6 

3.7 

3.8 

3.9 

4.0 

4.1 

4.2 

60 

1.96 

2.2 

2.3 

2.4 

2.4 

2.5 

2.5 

2.6 

2.6 

2.7 

2.7 

oo 

1.00 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 


Source: Beyer, W. H., ed.. Handbook of Tables for Probability and Statistics, Second Edition, The Chemical Rubber 
Co., 1968. With permission. 


TABLE A.2 

Table of Bonferroni Critical Points 


Values of c for 1 - a = .95 


V 

5 

7 

10 

12 

15 

20 

24 

30 

40 

60 

120 

oo 

2 

3.17 

2.84 

2.64 

2.56 

2.49 

2.42 

2.39 

2.36 

2.33 

2.30 

2.27 

2.24 

3 

3.54 

3.13 

2.87 

2.78 

2.69 

2.61 

2.58 

2.54 

2.50 

2.47 

2.43 

2.39 

4 

3.81 

3.34 

3.04 

2.94 

2.84 

2.75 

2.70 

2.66 

2.62 

2.58 

2.54 

2.50 

5 

4.04 

3.50 

3.17 

3.06 

2.95 

2.85 

2.80 

2.75 

2.71 

2.66 

2.62 

2.58 

6 

4.22 

3.64 

3.28 

3.15 

3.04 

2.93 

2.88 

2.83 

2.78 

2.73 

2.68 

2.64 

7 

4.38 

3.76 

3.37 

3.24 

3.11 

3.00 

2.94 

2.89 

2.84 

2.79 

2.74 

2.69 

8 

4.53 

3.86 

3.45 

3.31 

3.18 

3.06 

3.00 

2.94 

2.89 

2.84 

2.79 

2.74 

9 

4.66 

3.95 

3.52 

3.37 

3.24 

3.11 

3.05 

2.99 

2.93 

2.88 

2.83 

2.77 

10 

4.78 

4.03 

3.58 

3.43 

3.29 

3.16 

3.09 

3.03 

2.97 

2.92 

2.86 

2.81 

15 

5.25 

4.36 

3.83 

3.65 

3.48 

3.33 

3.26 

3.19 

3.12 

3.06 

2.99 

2.94 

20 

5.60 

4.59 

4.01 

3.80 

3.62 

3.46 

3.38 

3.30 

3.23 

3.16 

3.09 

3.02 

25 

5.89 

4.78 

4.15 

3.93 

3.74 

3.55 

3.47 

3.39 

3.31 

3.24 

3.16 

3.09 

30 

6.15 

4.95 

4.27 

4.04 

3.82 

3.63 

3.54 

3.46 

3.38 

3.30 

3.22 

3.15 

35 

6.36 

5.09 

4.37 

4.13 

3.90 

3.70 

3.61 

3.52 

3.43 

3.34 

3.27 

3.19 

40 

6.56 

5.21 

4.45 

4.20 

3.97 

3.76 

3.66 

3.57 

3.48 

3.39 

3.31 

3.23 

45 

6.70 

5.31 

4.53 

4.26 

4.02 

3.80 

3.70 

3.61 

3.51 

3.42 

3.34 

3.26 

50 

6.86 

5.40 

4.59 

4.32 

4.07 

3.85 

3.74 

3.65 

3.55 

3.46 

3.37 

3.29 

100 

8.00 

6.08 

5.06 

4.73 

4.42 

4.15 

4.04 

3.90 

3.79 

3.69 

3.58 

3.48 

250 

9.68 

7.06 

5.70 

5.27 

4.90 

4.56 

4.4* 

4.2* 

4.1* 

3.97 

3.83 

3.72 
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TABLE A.2 (continued) 







Values of c for 1 

j dt = 

-a = .99 
_xn 

2m 





\ V 

5 

7 

10 

12 

15 

20 

24 

30 

40 

60 

120 

oo 

2 

4.78 

4.03 

3.58 

3.43 

3.29 

3.16 

3.09 

3.03 

2.97 

2.92 

2.86 

2.81 

3 

5.25 

4.36 

3.83 

3.65 

3.48 

3.33 

3.26 

3.19 

3.12 

3.06 

2.99 

2.94 

4 

5.60 

4.59 

4.01 

3.80 

3.62 

3.46 

3.38 

3.30 

3.23 

3.16 

3.09 

3.02 

5 

5.89 

4.78 

4.15 

3.93 

3.74 

3.55 

3.47 

3.39 

3.31 

3.24 

3.16 

3.09 

6 

6.15 

4.95 

4.27 

4.04 

3.82 

3.63 

3.54 

3.46 

3.38 

3.30 

3.22 

3.15 

7 

6.36 

5.09 

4.37 

4.13 

3.90 

3.70 

3.61 

3.52 

3.43 

3.34 

3.27 

3.19 

8 

6.56 

5.21 

4.45 

4.20 

3.97 

3.76 

3.66 

3.57 

3.48 

3.39 

3.31 

3.23 

9 

6.70 

5.31 

4.53 

4.26 

4.02 

3.80 

3.70 

3.61 

3.51 

3.42 

3.34 

3.26 

10 

6.86 

5.40 

4.59 

4.32 

4.07 

3.85 

3.74 

3.65 

3.55 

3.46 

3.37 

3.29 

15 

7.51 

5.79 

4.86 

4.56 

4.29 

4.03 

3.91 

3.80 

3.70 

3.59 

3.50 

3.40 

20 

8.00 

6.08 

5.06 

4.73 

4.42 

4.15 

4.04 

3.90 

3.79 

3.69 

3.58 

3.48 

25 

8.37 

6.30 

5.20 

4.86 

4.53 

4.25 

4.1* 

3.98 

3.88 

3.76 

3.64 

3.54 

30 

8.68 

6.49 

5.33 

4.95 

4.61 

4.33 

4.2* 

4.13 

3.93 

3.81 

3.69 

3.59 

35 

8.95 

6.67 

5.44 

5.04 

4.71 

4.39 

4.3* 

4.26 

3.97 

3.84 

3.73 

3.63 

40 

9.19 

6.83 

5.52 

5.12 

4.78 

4.46 

4.3* 

4.1* 

4.01 

3.89 

3.77 

3.66 

45 

9.41 

6.93 

5.60 

5.20 

4.84 

4.52 

4.3* 

4.2* 

4.1* 

3.93 

3.80 

3.69 

50 

9.68 

7.06 

5.70 

5.27 

4.90 

4.56 

4.4* 

4.2* 

4.1* 

3.97 

3.83 

3.72 

100 

11.04 

7.80 

6.20 

5.70 

5.20 

4.80 

4.7* 

4.4* 

4.5* 


4.00 

3.89 

250 

13.26 

8.83 

6.9* 

6.3* 

5.8* 

5.2* 

5.0* 

4.9* 

4.8* 



4.11 

^Obtained by graphical interpolation. 

Source: Dunn, O. Journal of the American Statistical Association, 56: 52-64,1961. With permission. 


TABLE A.3 

Probability Points for Multivariate f-Distribution 

Entries are f a/2: ,where P[max | T t | < t a/2:qjll for i = 1,2,..., 

.q] = 1 - a and t = 

[Tj,T 2 ,..., T q \ is distributed S(t:q,m;I) 

m 

1 

2 

3 

4 

5 

6 

8 

10 

12 

15 

20 

l-a=.90 

3 

2.353 

2.989 

3.369 

3.637 

3.844 

4.011 

4.272 

4.471 

4.631 

4.823 

5.066 

4 

2.132 

2.662 

2.976 

3.197 

3.368 

3.506 

3.722 

3.887 

4.020 

4.180 

4.383 

5 

2.015 

2.491 

2.769 

2.965 

3.116 

3.239 

3.430 

3.576 

3.694 

3.837 

4.018 

6 

1.943 

2.385 

2.642 

2.822 

2.961 

3.074 

3.249 

3.384 

3.493 

3.624 

3.790 

7 

1.895 

2.314 

2.556 

2.725 

2.856 

2.962 

3.127 

3.253 

3.355 

3.478 

3.635 

8 

1.860 

2.262 

2.494 

2.656 

2.780 

2.881 

3.038 

3.158 

3.255 

3.373 

3.522 

9 

1.833 

2.224 

2.447 

2.603 

2.723 

2.819 

2.970 

3.086 

3.179 

3.292 

3.436 

10 

1.813 

2.193 

2.410 

2.562 

2.678 

2.771 

2.918 

3.029 

3.120 

3.229 

3.360 

11 

1.796 

2.169 

2.381 

2.529 

2.642 

2.733 

2.875 

2.984 

3.072 

3.178 

3.313 

12 

1.782 

2.149 

2.357 

2.501 

2.612 

2.701 

2.840 

2.946 

3.032 

3.136 

3.268 

15 

1.753 

2.107 

2.305 

2.443 

2.548 

2.633 

2.765 

2.865 

2.947 

3.045 

3.170 
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TABLE A.3 (continued) 


Entries are t a/2jl , m 

where P[max | T ; | 

— ^a/2:q,m 

for i = 1,2, 

• ••/<?] = 1 

- a and t = 

ITvT*... 

., T I( ] is distributed S(t:q,m;l) 

m 

1 

2 

3 

4 

5 

6 

8 

10 

12 

15 

20 

20 

1.725 

2.065 

2.255 

2.386 

2.486 

2.567 

2.691 

2.786 

2.863 

2.956 

3.073 

25 

1.708 

2.041 

2.226 

2.353 

2.450 

2.528 

2.648 

2.740 

2.814 

2.903 

3.016 

30 

1.697 

2.025 

2.207 

2.331 

2.426 

2.502 

2.620 

2.709 

2.781 

2.868 

2.978 

40 

1.684 

2.006 

2.183 

2.305 

2.397 

2.470 

2.585 

2.671 

2.741 

2.825 

2.931 

60 

1.671 

1.986 

2.160 

2.278 

2.368 

2.439 

2.550 

2.634 

2.701 

2.782 

2.884 






1- 

a =.95 






3 

3.183 

3.960 

4.430 

4.764 

5.023 

5.233 

5.562 

5.812 

6.015 

6.259 

6.567 

4 

2.777 

3.382 

3.745 

4.003 

4.203 

4.366 

4.621 

4.817 

4.975 

5.166 

5.409 

5 

2.571 

3.091 

3.399 

3.619 

3.789 

3.928 

4.145 

4.312 

4.447 

4.611 

4.819 

6 

2.447 

2.916 

3.193 

3.389 

3.541 

3.664 

3.858 

4.008 

4.129 

4.275 

4.462 

7 

2.365 

2.800 

3.056 

3.236 

3.376 

3.489 

3.668 

3.805 

3.916 

4.051 

4.223 

8 

2.306 

2.718 

2.958 

3.128 

3.258 

3.365 

3.532 

3.660 

3.764 

3.891 

4.052 

9 

2.262 

2.657 

2.885 

3.046 

3.171 

3.272 

3.430 

3.552 

3.651 

3.770 

3.923 

10 

2.228 

2.609 

2.829 

2.984 

3.103 

3.199 

3.351 

3.468 

3.562 

3.677 

3.823 

11 

2.201 

2.571 

2.784 

2.933 

3.048 

3.142 

3.288 

3.400 

3.491 

3.602 

3.743 

12 

2.179 

2.540 

2.747 

2.892 

3.004 

3.095 

3.236 

3.345 

3.433 

3.541 

3.677 

15 

2.132 

2.474 

2.669 

2.805 

2.910 

2.994 

3.126 

3.227 

3.309 

3.409 

3.536 

20 

12.086 

2.411 

2.594 

2.722 

2.819 

2.898 

3.020 

3.114 

3.190 

3.282 

3.399 

25 

2.060 

2.374 

2.551 

2.673 

2.766 

2.842 

2.959 

3.048 

3.121 

3.208 

3.320 

30 

2.042 

2.350 

2.522 

2.641 

2.732 

2.805 

2.918 

3.005 

3.075 

3.160 

3.267 

40 

2.021 

2.321 

2.488 

2.603 

2.690 

2.760 

2.869 

2.952 

3.019 

3.100 

3.203 

60 

2.000 

2.292 

2.454 

2.564 

2.649 

2.716 

2.821 

2.900 

2.964 

3.041 

3.139 






1- 

a =.99 






3 

5.841 

7.127 

7.914 

8.479 

8.919 

9.277 

9.838 

10.269 

10.616 

11.034 

11.559 

4 

4.604 

5.462 

5.985 

6.362 

6.656 

6.897 

7.274 

7.565 

7.801 

8.087 

8.451 

5 

4.032 

4.700 

5.106 

5.398 

5.625 

5.812 

6.106 

6.333 

6.519 

6.744 

7.050 

6 

3.707 

4.271 

4.611 

4.855 

5.046 

5.202 

5.449 

5.640 

5.796 

5.985 

6.250 

7 

3.500 

3.998 

4.296 

4.510 

4.677 

4.814 

5.031 

5.198 

5.335 

5.502 

5.716 

8 

3.355 

3.809 

4.080 

4.273 

4.424 

4.547 

4.742 

4.894 

5.017 

5.168 

5.361 

9 

3.250 

3.672 

3.922 

4.100 

4.239 

4.353 

4.532 

4.672 

4.785 

4.924 

5.103 

10 

3.169 

3.567 

3.801 

3.969 

4.098 

4.205 

4.373 

4.503 

4.609 

4.739 

4.905 

11 

3.106 

3.485 

3.707 

3.865 

3.988 

4.087 

4.247 

4.370 

4.470 

4.593 

4.750 

12 

3.055 

3.418 

3.631 

3.782 

3.899 

3.995 

4.146 

4.263 

4.359 

4.475 

4.625 

15 

2.947 

3.279 

3.472 

3.608 

3.714 

3.800 

3.935 

4.040 

4.125 

4.229 

4.363 

20 

2.845 

3.149 

3.323 

3.446 

3.541 

3.617 

3.738 

3.831 

3.907 

3.999 

4.117 

25 

2.788 

3.075 

3.239 

3.354 

3.442 

3.514 

3.626 

3.713 

3.783 

3.869 

3.978 

30 

2.750 

3.027 

3.185 

3.295 

3.379 

3.448 

3.555 

3.637 

3.704 

3.785 

3.889 

40 

2.705 

2.969 

3.119 

3.223 

3.303 

3.367 

3.468 

3.545 

3.607 

3.683 

3.780 

60 

2.660 

2.913 

3.055 

3.154 

3.229 

3.290 

3.384 

3.456 

3.515 

3.586 

3.676 


Source: Hahn, G. J. and Hendrickson, R. W., Biometrika, Vol. 58. With permission. 
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Source: Pearson, E. S. and Hartley, H. O., Biometrika Tables for Statisticians, Vol. 1, Third Edition, Table 29, published by the Biometrika Trustees, Cambridge University 
Press, London, 1966. With permission. 
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TABLE A.5 


Percentage Points of the Duncan New Multiple Range Test 


Error df a 




r = 

: Number of Ordered Steps between Means 




2 

3 

4 

5 

6 

7 

8 
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10 

12 

14 

16 

18 

20 
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.05 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 


.01 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 

90.0 
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.05 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 
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14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 

14.0 
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4.50 
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4.50 

4.50 
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4.50 

4.50 
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8.26 
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4.02 

4.02 

4.02 

4.02 

4.02 


.01 

6.51 

6.8 

6.9 

7.0 

7.1 

7.1 

7.2 

7.2 

7.3 

7.3 

7.4 

7.4 

7.5 

7.5 

5 

.05 

3.64 

3.74 

3.79 

3.83 

3.83 

3.83 

3.83 

3.83 

3.83 

3.83 

3.83 

3.83 

3.83 

3.83 


.01 

5.70 

5.96 

6.11 

6.18 

6.26 

6.33 

6.40 

6.44 

6.5 

6.6 

6.6 

6.7 

6.7 

6.8 

6 

.05 

3.46 

3.58 

3.64 

3.68 

3.68 

3.68 

3.68 

3.68 

3.68 

3.68 

3.68 

3.68 

3.68 

3.68 


.01 

5.24 

5.51 

5.65 

5.73 

5.81 

5.88 

5.95 

6.00 

6.0 

6.1 

6.2 

6.2 

6.3 

6.3 

7 

.05 

3.35 

3.47 

3.54 

3.58 

3.60 

3.61 

3.61 

3.61 

3.61 

3.61 

3.61 

3.61 

3.61 

3.61 


.01 

4.95 

5.22 

5.37 

5.45 

5.53 

5.61 

5.69 

5.73 

5.8 

5.8 

5.9 

5.9 

6.0 

6.0 

8 

.05 

3.26 

3.39 

3.47 

3.52 

3.55 

3.56 

3.56 

3.56 

3.56 

3.56 

3.56 

3.56 

3.56 

3.56 


.01 

4.74 

5.00 

5.14 

5.23 

5.32 

5.40 

5.47 

5.51 

5.5 

5.6 

5.7 

5.7 

5.8 

5.8 

9 

.05 

3.20 

3.34 

3.41 

3.47 

3.50 

3.52 

3.52 

3.52 

3.52 

3.52 

3.52 

3.52 

3.52 

3.52 


.01 

4.60 

4.86 

4.99 

5.08 

5.17 

5.25 

5.32 

5.36 

5.4 

5.5 

5.5 

5.6 

5.7 

5.7 

10 

.05 

3.15 

3.30 

3.37 

3.43 

3.46 

3.47 

3.47 

3.47 

3.47 

3.47 

3.47 

3.47 

3.47 

3.48 


.01 

4.48 

4.73 

4.88 

4.96 

5.06 

5.13 

5.20 

5.24 

5.28 

5.36 

5.42 

5.48 

5.54- 

5.55 

11 

.05 

3.11 

3.27 

3.35 

3.39 

3.43 

3.44 

3.45 

3.46 

3.46 

3.46 

3.46 

3.46 

3.47 

3.48 


.01 

4.39 

4.63 

4.77 

4.86 

4.94 

5.01 

5.06 

5.12 

5.15 

5.24 

5.28 

5.34 

5.38 

5.39 

12 

.05 

3.08 

3.23 

3.33 

3.36 

3.40 

3.42 

3.44 

3.44 

3.46 

3.46 

3.46 

3.46 

3.47 

3.48 


.01 

4.32 

4.55 

4.68 

4.76 

4.84 

4.92 

4.96 

5.02 

5.07 

5.13 

5.17 

5.22 

5.23 

5.26 

13 

.05 

3.06 

3.21 

3.30 

3.35 

3.38 

3.41 

3.42 

3.44 

3.45 

3.45 

3.46 

3.46 

3.47 

3.47 


.01 

4.26 

4.48 

4.62 

4.69 

4.74 

4.84 

4.88 

4.94 

4.98 

5.04 

5.08 

5.13 

5.14 

5.15 

14 

.05 

3.03 

3.18 

3.27 

3.33 

3.37 

3.39 

3.41 

3.42 

3.44 

3.45 

3.46 

3.46 

3.47 

3.47 


.01 

4.21 

4.42 

4.55 

4.63 

4.70 

4.78 

4.83 

4.87 

4.91 

4.96 

5.00 

5.04 

5.06 

5.07 

15 

.05 

3.01 

3.16 

3.25 

3.31 

3.36 

3.38 

3.40 

3.42 

3.43 

3.44 

3.45 

3.46 

3.47 

3.47 


.01 

4.17 

4.37 

4.50 

4.58 

4.64 

4.72 

4.77 

4.81 

4.84 

4.90 

4.94 

4.97 

4.99 

5.00 

16 

.05 

3.00 

3.15 

3.23 

3.30 

3.34 

3.37 

3.39 

3.41 

3.43 

3.44 

3.45 

3.46 

3.47 

3.47 


.01 

4.13 

4.34 

4.45 

4.54 

4.60 

4.67 

4.72 

4.76 

4.79 

4.84 

4.88 

4.91 

4.93 

4.94 

17 

.05 

2.98 

3.13 

3.22 

3.28 

3.33 

3.36 

3.38 

3.40 

3.42 

3.44 

3.45 

3.46 

3.47 

3.47 


.01 

4.10 

4.30 

4.41 

4.50 

4.56 

4.63 

4.68 

4.72 

4.75 

4.80 

4.83 

4.86 

4.88. 

4.89 

18 

.05 

2.97 

3.12 

3.21 

3.27 

3.32 

3.35 

3.37 

3.39 

3.41 

3.43 

3.45 

3.46 

3.47 

3.47 


.01 

4.07 

4.27 

4.38 

4.46 

4.53 

4.59 

4.64 

4.68 

4.71 

4.76 

4.79 

4.82 

4.84 

4.85 

19 

.05 

2.96 

3.11 

3.19 

3.26 

3.31 

3.35 

3.37 

3.39 

3.41 

3.43 

3.44 

3.46 

3.47 

3.47 


.01 

4.05 

4.24 

4.35 

4.43 

4.50 

4.56 

4.61 

4.64 

4.67 

4.72 

4.76 

4.79 

4.81 

4.82 

20 

.05 

2.95 

3.10 

3.18 

3.25 

3.30 

3.34 

3.36 

3.38 

3.40 

3.43 

3.44 

3.46 

3.46 

3.47 


.01 

4.02 

4.22 

4.33 

4.40 

4.47 

4.53 

4.58 

4.61 

4.65 

4.69 

4.73 

4.76 

4.78 

4.79 

22 

.05 

2.93 

3.08 

3.17 

3.24 

3.29 

3.32 

3.35 

3.37 

3.39 

3.42 

3.44 

3.45 

3.46 

3.47 


.01 

3.99 

4.17 

4.28 

4.36 

4.42 

4.48 

4.53 

4.57 

4.60 

4.65 

4.68 

4.71 

4.74 

4.75 
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TABLE A.5 (continued) 


Error df a 




r = 

: Number of Ordered Steps between Means 




2 

3 

4 

5 

6 

7 

8 

9 

10 

12 

14 

16 

18 

20 

24 

.05 

2.92 

3.07 

3.15 

3.22 

3.28 

3.31 

3.34 

3.37 

3.38 

3.41 

3.44 

3.45 

3.46 

3.47 


.01 

3.96 

4.14 

4.24 

4.33 

4.39 

4.44 

4.49 

4.53 

4.57 

4.62 

4.64 

4.67 

4.70 

4.72 

26 

.05 

2.91 

3.06 

3.14 

3.21 

3.27 

3.30 

3.34 

3.36 

3.38 

3.41 

3.43 

3.45 

3.46 

3.47 


.01 

3.93 

4.11 

4.21 

4.30 

4.36 

4.41 

4.46. 

4.50 

4.53 

4.58 

4.62 

4.65 

4.67 

4.69 

28 

.05 

2.90 

3.04 

3.13 

3.20 

3.26 

3.30 

3.33 

3.35 

3.37 

3.40 

3.43 

3.45 

3.46 

3.47 


.01 

3.91 

3.08 

4.18 

4.28 

4.34 

4.39 

4.43 

4.47 

4.51 

4.56 

4.60 

4.62 

4.65 

4.67 

30 

.05 

2.89 

3.04 

3.12 

3.20 

3.25 

3.29 

3.32 

3.35 

3.37 

3.40 

3.43 

3.44 

3.46 

3.47 


.01 

3.89 

4.06 

4.16 

4.22 

4.32 

4.36 

4.41 

4.45 

4.48 

4.54 

4.58 

4.61 

4.63 

4.65 

40 

.05 

2.86 

3.01 

3.10 

3.17 

3.22 

3.27 

3.30 

3.33 

3.35 

3.39 

3.42 

3.44 

3.46 

3.47 


.01 

3.82 

3.99 

4.10 

4.17 

4.24 

4.30 

4.34 

4.37 

4.41 

4.46 

4.51 

4.54 

4.57 

4.59 

60 

.05 

2.83 

2.98 

3.08 

3.14 

3.20 

3.24 

3.28 

3.31 

3.33 

3.37 

3.40 

3.43 

3.45 

3.47 


.01 

3.76 

3.92 

4.03 

4.12 

4.17 

4.23 

4.27 

4.31 

4.34 

4.39 

4.44 

4.47 

4.50 

4.53 

100 

.05 

2.80 

2.95 

3.05 

3.12 

3.18 

3.22 

3.26 

3.29 

3.32 

3.36 

3.40 

3.42 

3.45 

3.47 


.01 

3.71 

3.86 

3.93 

4.06 

4.11 

4.17 

4.21 

4.25 

4.29 

4.35 

4.38 

4.42 

4.45 

4.48 

oo 

.05 

2.77 

2.92 

3.02 

3.09 

3.15 

3.19 

3.23 

3.26 

3.29 

3.34 

3.38 

3.41 

3.44 

3.47 


.01 

3.64 

3.80 

3.90 

3.98 

4.04 

4.09 

4.14 

4.17 

4.20 

4.26 

4.31 

4.34 

4.38 

4.41 


Source: Duncan, D. B., Biometrics, 11:1-42,1955. With permission. 
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A 

Absolute deviations, values, 27 
Agricultural study, 148-150 
AIC. See Akaike Information Criteria (AIC) 
AICC, 557,558, 567, 588,593 

covariance structure, 565, 576, 577 
Aircraft engines, 145-146 
Akaike Information Criteria (AIC), 37, 557, 
558,567, 577,588 
AR(1), 562,593 

covariance structure, 565, 576 
Analysis of variance (ANOVA), 35,103, 240, 
241, 521, 607 

balanced incomplete block design 
structure, 95 

bread baking study, 124,125 
bread recipes and baking temperature 
example, 423 

cheese-making experiment, 133,134,135 
column models, 127 
comfort study, 141,148, 512 
companies and insecticide example, 635 
complex split-plot design, 130 
cooking beans example, 117,118, 

119,120 

crossover design, 86,103,106,109, 

146, 602, 604 

display case study, 125,130 
drug heart rate study, 503 
expected mean squares, 343 
family attitude study, 517 
flour milling experiment, 121 
IBD design structures, 96,125 


Latin square, 88, 89, 92, 93 
loaf volume, 425-426 
nested designs structure, 148 
nitrogen and irrigation, 481 
nitrogen by irrigation example, 486 
one-way model, 236,296 
one-way treatment structure, 84, 88, 

106,107,108 

paint-paving experiment, 190,191 
RCBD, 84, 85, 94,104,107,108,110,144 
repeated Latin square treatment 
structure, 90 

row and column blocks example, 98 
soybean varieties nested within 
maturity groups, 144 
split-plot designs, 109,110,145 
strip-plot designs. 111, 112, 473, 474, 486 
strip-split-plot design, 484,485, 489 
three-way factorial arrangement, 87 
two-way nested treatment structure, 146 
two-way random effects model, 339 
two-way treatment structures, 89, 90, 

92,94,103,104,109,110,112, 

250, 251 

type I sums of squares, 342 
type III, 605 

unbalanced two-way experiment, 

201, 204 

unweighted means, 233, 236, 237 
variance components inferences 
methods, 338 
wheat yield data, 427 
whole-plot design structure, 109 
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Animal comparison rations, 453-456, 458 
asymptotic covariance matrix of 
estimates, 458 
covariance parameters, 454 
fixed effects type III tests, 454 
pairwise comparisons, 455 
SAS-mixed code, 454 
temperature force means, 455 
Tukey adjustment, 455 
Animal genetics, 143 

nested treatment structure, 143 
strip-plot designs, 143 

ANOVA. See Analysis of variance (ANOVA) 
AR(1) structure, 556 
AIC, 562,593 
drug heart rate study, 564 
estimated R matrix, 561, 562 
heterogeneous variances, 562 
Assembly line workers, 365-381 
covariance parameter, 365, 367 
data, 366 
efficiency, 365 

estimates of variance components, 371 

expected mean square, 367, 371 

JMP fit model screen, 380 

mean square, 367 

ML, 367 

MVQ, 367 

MVQNB, 367 

reduced model, 371, 375, 381 
REML, 367, 381 
REML estimates, 374, 375 
Satterthwaite approximation, 373, 375 
Satterthwaite-type confidence 
intervals, 374 

type I analysis, 367, 371, 377 
type I sums of squares, 375 
type I tests of hypotheses, 369 
type II, 367 
type III, 367 

type III analysis of variance, 368, 371 
type III sums of squares, 373 
type III tests, 370,373 
variance components, 369 
Wald-type confidence intervals, 375 
Asymptotic covariance matrix of estimates, 458 
Asymptotic sampling distribution, 26 


Baking bread example. See Bread baking study 
Balanced incomplete block design structure, 94 
analysis of variance, 95 


Balanced model 

covariance matrix, 406 
data set, 408 
one-way, 319-321 

Balanced one-way random effects treatment 
structure, 356 

Balanced two-way experiments, 187-198 
case study, 187-198 
interaction effects contrasts, 188 
main effect means contrasts, 187 
multiple comparisons, 195 
paint-paving example, 189-192 
quantitative treatment factors 
analysis, 193-194 

Balanced two-way treatment structures, 
181-184, 207, 243-244 
computer analyses, 184 
effects model, 182,209-230 
fat-surfactant example, 239-242 
interactions, 182, 202-203 
main effects, 183, 202-203 
means model, 181 
model definitions, 181-182,199 
multiple comparisons, 206 
parameter estimation, 182,199-200 
population marginal means, 204-205 
simultaneous interferences, 206 
testing whether all means are equal, 201 
unequal subclass numbers case 
study, 239-244 

unequal subclass numbers means 
model analysis, 199-207 
Bartlett's test, 25, 26 

one-way treatment structure, 24 
Batch error terms, 121 
Benjamini and Hochberg method, 45 
adjusted p-values, 60 
control FDR, 44, 58-59 
LDC, 59 

simultaneous inference procedures, 58 
Best estimates, 504 

Best linear unbiased estimator (BLUE), 

395,398 

estimable functions, 397 
machine person example, 406 
Best linear unbiased predictor (BLUP), 

397, 398 

BF test GLM code, 35,36 
Bias prevention, 73 

BIC. See Schwarz's Bayesian Criterion (BIC) 
Big block analysis of variance 

nitrogen by irrigation example, 486 
Block design structure, 76 
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Block effects, 74 
Block interaction, 474 
Blocking, 71, 97 

Blocks and replications example, 94-97 
BLUE. See Best linear unbiased estimator 
(BLUE) 

BLUE See Best linear unbiased predictor (BLUP) 
Bonferroni-Holm method, 58 

simultaneous inference procedures, 58 
Bonferroni's method, 45, 48, 49, 51, 55,195 
contrast, 60 
differences, 53 

simultaneous inference procedures, 48 
widths of confidence intervals, 57 
Bootstrap methodology, 51 
Box's correction 

conservative estimate, 548-549 
sorghum varieties and fertilizer levels, 549 
Bread baking study, 122-125 
baking bread, 122-124 
combinations multilevel designs, 122-124 
IBD design structures, 123,124,125 
split-plot design structures, 122-124 
Bread recipes and baking temperature 
example, 422-426, 433-434 
analysis of variance, 423,425-426 
means comparing, 434 
randomization, 423, 424 
SAS-mixed code, 426 
split-plot analysis, 422-425 
Brown and Forsythe's test, 25, 27 
equality of variances, 35 
homogeneity of errors variance, 35 
one-way treatment structure, 24 

C 

Cell means, 201 

Change-over designs, 141-142. See also 
Crossover designs 
Cheese making example, 131-135 
analysis of variance, 133,135 
analysis of variance for batch 

experimental unit design, 134 
batch design, 133 
chamber design, 132 
chamber experimental unit, 133 
combinations multilevel designs, 131-134 
randomization scheme, 131,132,133 
repeated measures multilevel designs, 
131-134 

split-plots designs, 131-134 
strip-plot designs, 131-134 


Chi-square 

sphericity tests, 552 

Coffee price example, 289-290, 629-630, 
632-634, 638-639 
analysis of variance, 633 
estimates of mean, 634 
estimates of variance components, 634 
JMP data screen, 638 
JMP fit model screen, 639 
nested designs analysis, 629 
random effects model, 632 
REML estimates, 639 
three-way, two-level, nested system, 289 
type I sums of squares, mean squares, 633 
Column blocks, 80 
Column marginal mean, 201 
Column models 

analysis of variance, 127 
Combinations multilevel designs, 101-154 
bread baking study, 122-124 
cheese-making experiment, 131-134 
Comfort study, 139-141,147-148, 510-516, 
524-527,570-583, 631-632 
analysis of variance, 141,148, 512, 632 
comparisons between clothing, 580 
comparisons between environments, 582 
comparisons between sexes, 581 
comparisons between times, 582 
complex, 510-516,524-527, 570-583 
compound symmetry covariance 
structure, 576 

covariance parameter estimates, 524, 577 
data, 511, 574, 632 
data arrangement, 140 
environment x time means, 513 
environment x time means with 
comparisons, 515 
Fisher's LSD method, 512 
fit statistics, 576 
JMP fit model screen, 637 
JMP screen, 636 
least squares from JMP, 638 
least squares means, 526 
linear and quadratic contrasts, 578, 579 
linear and quadratic contrasts for 
time, 525 

means with LSD, 513 
measures of linear and quadratic 
trends, 514 

nested designs analysis, 629 
pairwise comparisons, 526 
persons and temperatures assignments 
to chambers, 147 
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Comfort study ( Continued) 

repeated measures complex examples, 
573-582 

repeated measures factor, 575 
response over time, 582 
SAS commands, 524 
SAS-mixed statements, 579 
Satterthwaite estimated degrees 
of freedom, 514 

sex x clothing means with LSD, 513 
split-plot-in-time analysis, 575 
standard error of comparison, 513 
strip-plot designs, 139-140,147 
three-way least squares means, 580 
Tukey multiple comparisons, 638 
two-way least squares means, 580 
type III tests of fixed effects, 578 
type II tests of fixed effects, 525 
Companies and insecticide example, 627-628, 
630-631, 634-635 
analysis of variance, 635 
data, 628 

means estimates, 631 
model, 634 

nested designs analysis, 627-628 
treatment structure, 628 
Comparisons 
area means, 519 
comfort study, 515 
error rate, 44 

family attitude study, 592 
mean to mean, 54 
percentage points, 54 
Complete block design structures, 79 
Completely randomized design (CRD), 71 
analysis of variance, 86,103,106,109,146 
cupcakes, 102 
design structure, 79, 84 
error term and variance source, 113 
means and effects models, 86 
one-way treatment structure, 1-17, 21-42 
two-way treatment structures, 159 
whole-plot structure, 109 
Complex split-plot design, 130 
Complex three-way random effect, 339-343 
Compound symmetry covariance 
structure, 556 
comfort study, 576 
estimated R matrix, 562 
heterogeneous variances, 562 
Computational algorithms, 321 
Computer analyses, 184 
Conditional error, 13-14 


Confidence intervals, 247-249, 358 
graph, 354 
region, 378 

Construct sequence treatments, 141 
CONTRAST statement, 216 
Cooking beans experiment example, 116-120 
analysis of variance, 117,118,119,120 
batch analysis, 119 
cooking methods, 118 
experimental unit size, 120 
randomization scheme for assigning 
varieties, 116 

randomization scheme of assigning 
cooking methods, 117 
randomly assign to batches, 118 
row analysis, 117 
strip-plot designs, 116-119 
whole plot or row design structure, 119 
Covariance matrix, 590 
Covariance parameter estimates, 37 
comfort study, 577 
crossover designs, 615 
crossover designs six sequences, 620 
family attitude study, 528, 589 
Covariance structures comparison, 587 
CRD. See Completely randomized design (CRD) 
Crossover designs, 71, 599-618, 623-625 
analysis, 599-626 
ANOVA, 602, 604 
contrasts, 615 

covariance parameter estimates, 615 

definitions, assumptions, and models, 599 

ESTIMATE options, 607 

expected main squares, 605 

fixed effects, 609 

F-statistic for testing, 602 

GRIZ, 604 

least squares means, 608, 616 
milk yields, 622 

more than two periods, 609-615 
more than two treatments, 616-625 
pain score, 622 

period cell means sequencing, 613 

SAS code, 613 

SAS-GLM code, 604, 605 

SAS-mixed code, 606 

sequences set, 600 

strip-plot designs, 141 

subject error mean square, 604 

subjects contrast, 612 

three period/two treatment, 610, 617-618 

treatment, 609, 616 

treatment main effect means, 605 
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treatment/4 period, 623 
two period/two treatment 
designs, 600-608 
two-way means, 609 
type III ANOVA, 605 
type III tests of fixed effects, 608, 615 
wash-out period, 607 
Crossover designs six sequences, 617-618, 
619-621 

covariance parameter estimates, 620 
estimable least squares means for 
treatments, 621 
least squares means, 620 
pairwise comparisons, 621 
type III tests of fixed effects, 620 
Cupcake experiment, 102-115 
design structure, 102,108 
IBD design structures, 110 
RCBD, 103 

split-plot design, 104,106 
split-plot design structure, 109 
strip-plot designs, 110, 111 
volume evaluation, 102 

D 

Data-driven comparisons, 45 
Data snooping, 45 
Designing experiments, 71-100 
blocks and replications, 94-96 
design structure, 77 
diets, 83-87 
examples, 83-97 
house paint, 88-89 
nitrogen and potassium levels 
example, 92-93 
row and column blocks, 97 
steel plates, 90-91 
structures, 66-79 
treatment structure, 77 
treatment structure types, 80-82 
types, 78-79 
Design matrix, 155 
Design structure, 71 

blocks and replications, 94 
cupcake treatment structure, 

102,108 

nested or hierarchical structure, 105 
nitrogen and potassium levels 
example, 92 

oven treatment structure, 108 
row and column blocks, 97 
split-plot and repeated measures, 83 


Diets example, 83-87 

designing experiments, 83-87 
treatment structure, 83, 87 
Display case study, 125-130 
analysis of variance, 125,130 
intensities assignments, 125 
levels of lighting at temperature, 128 
levels of packaging at temperature, 129 
strip-plot designs, 125-130 
temperature intensity levels, 127 
temperatures assignments, 125 
Drug heart rate study, 500-510, 519-523, 
535-537,558-570,592-596 
analysis of variance, 506 
AR(1) conditions, 564 
comparisons of time means, 522 
compound symmetry, 563 
covariance parameter estimates, 521 
differences of least squares means, 523 
drug and time means, 521 
drug effects, 500,506 
estimated covariance matrices, 563 
estimated standard error, 507 
estimate options, 523 
graph of means, 508 
H-F conditions, 536, 560, 563 
linear and quadratic contrasts in time, 523 
linear and quadratic trends estimates, 508 
LSD value, 507 
pairwise comparisons, 522 
SAS-mixed code, 520 
Satterthwaite estimated degrees 
of freedom, 509 
Satterthwaite's method, 504 
statistical tests, 521 
time means comparisons, 507, 509 
unstructured conditions, 564 
vector of responses, 535 
Drugs and errors examples, 25-38 

one-way treatment structure, 25-28, 31-32 
Duncan's method 

multiple comparison procedure, 67 
multiple range tests, 61, 64 
proc GLM code, 66 

simultaneous inference procedures, 64 
Dunnett's method, 45, 55 

simultaneous inference procedures, 53 


EER. See Experimentwise error rate (EER) 
EERC. See Experimentwise error rate under 
the complete null hypothesis (EERC) 
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Effects model analysis, 164, 209-230 

balanced two-way treatment structures, 
209-230 

completely randomized design 
structure, 86 

computer analyses, 226-227 
defined, 182 

estimable functions in SAS, 213-216 
exercises, 229 
hypotheses, 219, 220 
model definition, 209 
parameters estimates and type I analysis, 
209-212 

population marginal means and least 
squares means, 226 
type III hypotheses, 220 
types I-IV estimable functions in 
SAS-GLM, 222-225 
types I-IV hypotheses, 217-221 
unequal subclass numbers, 209-230 
Engines on aircraft example. See Aircraft 
engines 

Environmental conditions, 139 
Equality of means 

hypothesis testing, 11 
one-way model, 14 
task and pulse rate data, 15 
Equal variances, 22 
Error rates 

degrees of freedom, 113,117 
nitrogen and potassium levels 
example, 92 
ratio, 66 

simultaneous inference procedures, 44 
structure, 92 
Error terms 
designs, 113 
split-plot designs, 113 
strip-plot designs, 113 
variability sources, 113 
Error variances, 22 
Estimable functions 
basis set, 215 
general forms, 214 
ESTIMATE options, 261 
ESTIMATE statement, 216 
Estimated population marginal 
means, 173 

Estimated standard errors, 504 
cell means, 176 
difference, 408 
GPA data, 176 

machine person example, 408 


Estimating variance components method, 
309-336 

algebraic method, 293-294 
assembly line workers, 371 
balanced one-way model, 318-321, 319-321 
balanced one-way model REML solution, 
323-324 

computing expected mean squares, 
292-306 

description, 325-326 
exercises, 308, 334-336 
general random effects model in matrix 
notation, 290-291 

Hartley's method of synthesis, 295-306 
JMP variance components estimation, 
329-333 

maximum likelihood estimators, 318-321 
MIVQUE method, 325-328 
moments method, 309-317 
one-way random effects model, 290-291, 
313-314 

random effects nested treatment 
structure, 289 

restricted or residual maximum likelihood 
estimation, 322-324 
two-way design, 315-317 
unbalanced one-way model, 309-312 
unbalanced one-way design, 327-328 
Examples. See specific name 
Expected mean square 
analysis of variance, 343 
assembly line workers, 367, 371 
and F-statistic, 343 
Experimental unit, 71, 72 
cooking beans example, 120 
flour milling experiment, 121 
horse feet, 136 
selection, 74 
size, 120,121 

Experiments. See also specific name 
applications, 72 

combinations and generalizations, 80 

design, 73 

designing, 83-87 

objectives, 72 

randomization scheme, 78 

statistical objective, 73 

times, 72 

treatments, 72 

Experimentwise error rate (EER), 44, 45 
Experimentwise error rate under the 
complete null hypothesis 
(EERC), 44, 61 



Index 


661 


Factorial arrangement, 82 
treatment structure, 81 
False discovery rate (FDR), 44, 45 

Benjamini and Hochberg method, 58 
contrast, 60 

Family attitude study, 501-502, 516-519, 
525,527-529, 583-592 
analysis of variance, 517 
comparison of person means, 518 
comparison of time means, 518 
comparisons, 592 
covariance parameter estimates, 

528, 589 

covariance structures comparison, 587 
data, 517, 583 

estimated covariance matrix, 590 
fit statistics, 588 
fixed effects, 589 
least squares means, 591 
pairwise comparison, 529, 591 
repeated measures complex examples, 
583-591 

SAS commands, 527-529, 587 
two-way least squares means, 591 
two-way means, 528 
type III tests of fixed effects, 528,590 
Familywise error rate (FWER), 44, 45, 47, 
49-51, 54, 58, 60, 64 

Fat-surfactant example, 239-242, 265-267 
balanced two-way treatment structures, 
239-242 

least squares means, 241, 268 
pairwise comparisons, 242 
SAS analyses, 240 
specific volumes, 240 
treatment combination, 266 
two-way combinations, 265 
volumes, 266 

FDR. See False discovery rate (FDR) 
Fertilizer. See Moisture and fertilizer 
example; Sorghum varieties 
and fertilizer levels; Wheat 
varieties example 
Final regression model, 448 
First-order Taylor's series, 349 
Fisher's least significant difference 
method, 64, 65 
comfort study, 512 

simultaneous inference procedures, 47 
Fit statistics 

comfort study, 576 


family attitude study, 588 
four covariance structures, 576 
Fitting-constants method, 306 
Fixed effects, 287 
determination, 288 
drug heart rate study, 568 
family attitude study, 589 
SAS-mixed commands, 568 
type III tests, 568 
Fractional factorial arrangement 
treatment structure, 81 
F-statistic, 9, 47, 66-68, 202-203, 216, 248-249 
analysis of variance, 343 
crossover designs, 602 
multiple comparisons, 195 
Full model, 13 

covariance parameter estimates, 446 
evaluate likelihood function, 346 
fixed effects solution, 447 
proc mixed code and results for fitting, 346 
response surface, 447 
SAS-mixed code, 446 
FWER. See Familywise error rate (FWER) 

G 

General random effects model 
components, 291 

maximum likelihood estimators, 318 
G-G. See Greenhouse and Geisser (G-G) 
estimate 

Grade point average data, 175-178 
estimated standard errors, 176 
least squares estimator, 176 
SAS-GLM code, 178 
students to classes population 
distribution, 178 

two-way treatment structures, 175,176 
Greenhouse and Geisser (G-G) estimate 
adjusted degrees of freedom, 552 
sorghum varieties and fertilizer levels, 
549, 552 

Grinding wheat example, 120-121 
analysis of variance, 121 
batch error terms, 121 
error terms, 121 
experimental unit size, 121 
randomization process, 121 
strip-plot designs, 120-121 
whole-plot analysis, 121 
whole-plot method, 120 
Grizzle's data using crossover designs, 

602, 603, 604 
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H 

Hartley's method of synthesis 

computing expectations, 299-300 
estimating variance components, 295-306 
expected values, 305 
random effects models, 295-306 
sums of squares expectation, 300 
Hartley's test, 25 
F-Max, 23 

Henderson's method I 
sums of squares, 315 
Henderson's method III, 306 
sums of squares, 316-317 
Henderson's methods I, II, III, and IV, 305 
Herbicide. See Nitrogen by irrigation 

example; Soybeans and herbicides 
Heterogeneous AR(1) structure, 556 
Heterogeneous compound symmetry 
structure, 556 
Heterogeneous errors 

one-way treatment structure, 21-42 
variances, 22 

H-F. See Huynh-Feldt (H-F) structure 
Hierarchical design structure 
analysis of variance table, 115 
split-plot designs, 627 
teaching method study, 115 
Higher-order treatment structures 
analysis, 271-276 
analysis, 284 
cross classified, 284 
Holm modification, 58 
Homogeneity of errors variance, 35 
Homogeneity of variances tests, 23-25 
Homogeneous errors 

linear combinations simultaneous tests, 7-8 
one-way treatment structure, 1-17 
Horse feet experiment, 136-139 
analysis of variance, 139 
experimental units sizes, 136 
IBD design structures, 138 
strip-plot designs, 136-138 
three-way treatment structures, 137 
treatment combinations assignment, 136,138 
House paint example, 88-90 
designing experiments, 88-89 
nested within squares design, 143 
two-way treatment structure, 89 
Huynh-Feldt (H-F) structure, 555, 556 
correction, 549 

drug heart rate study, 536, 560, 563 


estimated R matrix, 560 
sorghum varieties and fertilizer 
levels, 549 

Hypothesis testing, 247-249 
mixed models, 396 
nested designs analysis, 633-635 
two-way treatment structures, 247-248 
variance components inferences 
methods, 337-346 


IBD. See Incomplete block design (IBD) 
structures 

Incomplete block design (IBD) structures, 
71, 79, 80 

analysis of variance, 96,124,125 
bread baking study, 123,124,125 
cupcake experiment, 110 
horse foot experiment, 138 
temperature assignments, 123 
whole-plot, 123-125 

Insect damage in wheat, 349-350 
computations, 314 
REML, 349 

Interaction hypotheses, 190 


JMP 

coffee prices example, 638 
comfort study, 636, 638 
data, 331, 638 
data set, 329 

estimating variance components 
methods, 329-333 
fit model screen, 330 
REML, 331, 332,333 
semiconductor industry, 490-491 
strip-plot designs, 486-490 
variance components estimation, 
329-333 

JMP computations 
fertilizer, 459-465 

random effects model case study, 379 
split-plot designs, 459-465 
JMP fit model, 330, 332 

assembly line workers, 380 
coffee prices example, 639 
comfort study, 637 

unbounded variance components, 333 
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K 

Kackar-Harville type of adjustment, 395 
Kenward-Roger's method, 555 
Kronecker or direct product, 319 


Large balanced two-way experiments 
having subclass numbers, 
231-238 

computer analyses, 236 
exercises, 238 
feasibility problems, 231 
method of unweighted means, 235 
simultaneous inference and multiple 
comparisons, 233-234 
unweighted means, 232 
Latin square, 71 

analysis of variance, 89, 92, 93 
analysis of variance table, 88 
arrangement, 90, 92,93 
design, 80, 90 
design structure, 88, 90, 97 
house paint, 88 

one-way treatment structure, 88 
row and columns, 97 
steel plates, 90 
temperature level, 91 
treatment structure, 88, 89, 91, 93 
Least significant difference (LSD) 

procedure, 45, 53, 64, 65,195 
comfort study, 512 
comparing, 68 
drug heart rate study, 507 
simulated error rates, 47 
simultaneous inference procedures, 46 
Least squares estimator, 164 
GPA data, 176 
Least squares means 
comfort study, 526 
crossover designs, 608, 616 
crossover designs six sequences, 620 
family attitude study, 591 
fat-surfactant example, 241, 267, 268 
machine person example, 408 
multicenter drug experiment, 595 
pairwise comparisons, 267 
plot, 242 

three way interactions, 465 
Tukey HSD option, 465 
Least squares solution, 168 


Levene's test, 25 
computations, 27 
one-way treatment structure, 24 
Likelihood ratio test, 344-345 
statistic, 538 

variance components inferences 
methods, 344 
Wilks', 538 

Linear and quadratic trends 
comfort study, 514, 578 
drug heart rate study, 508 
SAS-mixed code with estimate 
statements, 442 
Linear comparisons, 59 
drug heart rate study, 505 
one-way treatment structure, 4 
proc mixed code, 57 
SAS-Mixed code, 56 
Linear model 

matrix form, 171-172 
mixed matrix notation, 386 
testing hypotheses, 171-172 
LSD. See Least significant difference (LSD) 
procedure 

M 

Machine person example, 401-417 
BLUE, 406 

estimate of standard error of 
difference, 408 
JMP analysis, 415-417 
nested designs analysis, 636-639 
pairwise difference, 408 
productivity of machines, 401 
productivity scores, 402 
Satterthwaite confidence intervals, 412 
two-way mixed model, 401 
unbalanced two-way mixed model, 409 
Main effect contrasts, 190 
Main effects means, 283 
Making cheese example. See Cheese 
making example 

MANOVA. See Multivariate analysis of 
variance (MANOVA) methods 
Marginal means 
best estimates, 227 

unbalanced two-way experiment, 201 
Matrix form of model, 155-180 
basic notation, 155-160 
connectedness, 171 
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Matrix form of model ( Continued) 
estimability and connected designs, 
169-171 

estimable functions, 170 
least squares equations, 161-168 
linear model testing hypotheses, 171-172 
one-way treatment structure, 156-157, 
167-168 

population marginal means, 172-178 
set-to-zero restrictions, 166 
simple linear regression model, 156 
sum-to-zero restrictions, 165-166 
two-way treatment structure means 
model, 158-160 

Mauchly's criterion sphericity tests, 552 
Maximum likelihood estimates (ML), 556 
assembly line workers covariance 
parameter, 367 

difference between machines, 414 
estimates of mean for machine, 413 
estimates of variance components, 

403, 411 

estimating variance components 
methods, 318-321 
general random effects model, 318 
machine person example, 402 
models variance components, 347 
proc mixed code to compute, 321 
proc mixed code to obtains, 322 
solutions, 348 
two-way mixed model, 390 
Means model, 158 

balanced two-way treatment 
structures, 181 
character, 21 

completely randomized design 
structure, 86 
defined, 181 
matrix form, 160 
raw, least squares, and weighted 
GPA data, 176,177 
type III hypotheses, 220 
Mean square, 367 
Means/Walker option, 66 
Meat display case study. See Display 
case study 
Method of moment 
estimators, 311 

machine person example, 402 
technique, 316 

Method of unweighted means, 235 
Milk yields, 622 


Minimum variance quadratic unbiased 
estimators (MIVQUE), 309, 325 
assembly line workers covariance 
parameter, 367 

difference between machines, 414 
estimates of mean for machine, 413 
estimates of variance components, 403, 411 
estimating variance components methods, 
325-328 

machine person example, 402 
no bound, 367 

proc mixed code to compute, 328 
proc mixed code to obtain, 328 
variance components, 329 
MINQUE method, 387, 394 
Missing treatment combinations, 245-247 
two-way experiment, 246 
MIVQUE. See Minimum variance quadratic 
unbiased estimators (MIVQUE) 
MIVQUE0. See Minimum variance quadratic 
unbiased estimators (MIVQUE) 
Mixed models 
analysis, 385-400 
balanced and unbalanced, 401 
best linear unbiased prediction, 397 
case studies, 401-420 
confidence intervals, 395 
estimation, 394 
exercises, 399-400,418-420 
fixed effects, 394-396 
maximum likelihood, 390-391 
MINQUE method, 394 
mixed model equations, 397-398 
moments method, 387-389 
parts, 385 
procedures, 555 
random effects, 387-393 
residual maximum likelihood, 392-393 
testing hypotheses, 396 
two-way mixed model, 401-408 
unbalanced two-way data set JMP 
analysis, 415-416 

unbalanced two-way mixed model, 
409-414 

Mixed-up split-plot design, 448-450 

ML. See Maximum likelihood estimates (ML) 

Model 

comparison procedure, 15 
fixed or fixed effects, 288 
mixed or mixed effects, 288 
random or random effects, 288 
MODEL II, 309 
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MODEL statement SAS-GLM, 223 
Moisture and fertilizer example, 

440-448, 449 
dry matter graphic, 445 
dry matter measurements, 440-442 
linear and quadratic trends, 443, 444 
prediction surface for dry matter, 449 
Satterthwaite approximation, 444 
split-plot designs, 440-443 
Moments method 

estimating variance components 
methods, 309-317 
mixed models, 387-389 
Multicenter drug experiment, 592-596 
covariance parameter estimates, 595 
hypothesis tests of fixed effects, 595 
least squares means, 595 
main effect means, 596 
pairwise comparison, 596 
repeated measures complex examples, 
592-596 

SAS commands, 594 
time least squares means, 596 
Multilevel designs 

bread baking study, 122-124 
characteristics, 114 
cupcake experiment, 114 
Multilocation study 

alfalfa varieties assignment, 149 
analysis of variance, 150 
experimental units display, 149 
repeated measures, 150 
repeated measures complex examples, 
592-596 

strip-plot designs, 148-149 
Multiple comparisons, 43-70, 52, 66 
critical differences, 55 
SAS system code, 54 
and simultaneous inference 
procedures, 43-70 
types, 45 

Multiple range tests 

pairwise comparisons example, 66-67 
simultaneous inference 
procedures, 61-63 
Multivariate analysis of variance 
(MANOVA) methods 
cross-products matrix, 546 
error sums of squares, 546 
interaction tests, 547 
repeated measures experiments, 537-546 
SAS-GLM code, 545 


sorghum varieties and fertilizer levels, 
545, 547, 553 
test for time, 552 
Multivariate t method, 55-56 

simultaneous inference procedures, 55 
widths of confidence intervals, 57 
MVQ. See Minimum variance quadratic 
unbiased estimators (MIVQUE) 
MVQNB. See Minimum variance quadratic 
unbiased estimators (MIVQUE) 

N 

Nested designs analysis, 627-650 
coffee price example revisited, 629 
comfort experiment revisited, 629 
companies and insecticides, 627-628 
confidence interval construction, 633-635 
definitions, assumptions, models, 627-629 
machine person example, 636-639 
parameter estimation, 629-632 
testing hypotheses, 633-635 
Nesting 

analysis of variance, 148 
animal genetics, 143 
coffee prices example, 289 
observable engine-aircraft 
configurations, 146 
strip-plot designs, 142-150 
treatment structure, 143 
Nitrogen and potassium levels 
example, 92-94 
designing experiments, 92-94 
design structure, 92 
treatment combinations, 93 
Nitrogen by irrigation example, 477-490 
analysis of variance, 481 
big block analysis of variance, 486 
covariance parameter estimates, 479 
data comparisons, 479 
estimate statements, 480 
field layout, 481 
fixed effects tests, 479 
REML analysis, 479 
SAS-mixed code, 478, 479 
seeding rates, 484,486 
split-plot design, 481,482 
strip-plot as subplots, 486 
strip-plot data, 478 
strip-plot design, 481, 482 
strip-split-plot design, 484 
type III sums of squares analysis, 478 
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Noint option, 249 
Null hypothesis deviations, 172 
Null model likelihood ratio test, 37 
Nutrition scores example, 277-284 
main-effects means, 283 
SAS-GLM, 283 

three-way treatment structures, 277 
type IV hypotheses, 282 


O'Brien test, 25 
scores, 28 
statistic, 27 

One-fold nested design, 358 
One-way model 

analysis of variance, 296 
analyzing treatment, 22 
effects, 162 
equality of means, 14 
means, 162 

normal equations, 162 
random effects, 296, 362 
simulation code and results, 362 
One-way random effects model 
estimating variance components, 

290-291 

estimating variance components 
methods, 313-314 
random effects models, 290-291 
One-way treatment structure, 81 

analysis of variance, 84, 88,106,107,108 

arrangement, 90 

Bartlett's test, 24 

Brown and Forsythe's test, 24 

classification, 83 

combined with two-way treatment 
structure, 82 
comfort experiment, 147 
comparing all means, 34-37 
completely randomized design 
structure, 1-17, 21-42 
computer analyses, 15 
diets, 83 

drugs and errors examples, 25-28, 31-32 
Hartley's F-Max test, 23 
heterogeneous errors, 21-42 
homogeneity of variances tests, 23-25 
homogeneous errors, 1-17 
homogeneous errors linear combinations 
simultaneous tests, 7-8 
Latin square design structure, 88 
Levene's test, 24 


linear combinations, 4 

linear combinations inferences, 29-30 

loaf design, 424 

matrix form of model, 156-157,167-168 
means and effects, 167 
model definitions, 22 
model definitions and assumptions, 1 
O'Brien's test, 25 
parameter estimation, 2-3, 22 
principle of conditional error, 13-14 
Satterthwaite approximation for degrees 
of freedom, 33 

tasks and pulse rate example, 5-6,15 
testing and equality of all means, 11 
tests and confidence intervals, 4 
Optimal design treatment structures, 82 
Orthogonal components, 551 
Orthogonal polynomials expected values, 194 
Oven level analysis. See Bread baking study 
Overspecified or singular models, 164 


Pain score, 622 

Paint-paving example, 189-192 
analysis of variance, 190,191 
balanced two-way experiments, 189-192 
cell means, 189 
interaction hypotheses, 190 
main effect contrasts, 190 
single-degree-of-freedom tests, 191 

Paired-association learning task 
experiment, 26 

Pairwise comparisons, 21, 28, 45 
all least squares means, 281 
animal comparisons, 455 
comfort study, 526 
crossover designs six sequences, 621 
drug heart rate study, 522 
equality test, 53 
example, 51-53 
family attitude study, 529, 591 
fat-surfactant example, 242 
least squares means, 242 
least squares means for effect fat x 
surfactant, 267 

multicenter drug experiment, 596 
multiple range, 66-68 
percentage points, 52 
ration by packaging force, 456 
SAS system code, 52 

simultaneous inference procedures, 51-52 
unequal variance model, 38 



Index 


667 


Pairwise difference, 408 
Parameters 

estimable functions, 170 
functions, 504 
Pooled estimate, 3 
variance, 10 

Population marginal means, 204-205 
Principle of conditional error, 13,15 
PRIORTRT, 614 

Q 

Quadratic contrasts 
comfort study, 514, 578 


Random effects, 287, 386 
batch design, 133 

bread recipes and baking temperature 
example, 423,424 

cheese-making experiment, 131,132,133 
cooking beans example, 117 
determination, 288 
flour milling experiment, 121 
one-way model, 296, 362 
SAS-mixed commands, 567 
Random effects model 

algebraic method, 293-294 
case study, 365-381 
computing expected means squares, 
292-306 

confidence intervals, 374-378 
data set, 365-366 
estimating variance components, 
290-291 
estimation, 367 

general random effects model, 290-291 
Hartley's method of synthesis, 295-306 
inference procedures, 337 
JMP computations, 379 
in matrix notation, 290-291 
model building, 368-369 
one-way random effects model, 290-291 
random effects nested treatment 
structure, 289 
reduced model, 370-373 
and variance components, 287-308 
Randomized complete block design 
(RCBD), 71, 77 

analysis of variance, 84, 85, 94,104,107, 
108,110,144 

cupcake experiment, 103 


cupcake experimental unit, 107 
error term, 113 

structure, 79, 84, 85, 94,103,104,107, 

108,113,144 
variance source, 113 
whole-plot design, 105,110 
Raw, least squares, and weighted means, 

176,177 

RCBD. See Randomized complete block 
design (RCBD) 

Reduced model, 13 

assembly line workers, 371, 375, 381 
evaluate likelihood function, 346 
proc mixed code and results 
for fitting, 346 
Reduced regression model 

covariance parameter estimates, 447 
SAS-mixed code, 447 
solution for fixed effects, 450 
REML. See Restricted maximum likelihood 
estimates (REML) 

Repeated Latin square treatment structure, 90 
Repeated measures designs, 71,101-154 
cheese-making experiment, 131-134 
comfort experiment, 575 
complex comfort experiment, 573-582 
complex examples, 573-598 
family attitudes experiment, 583-591 
fit statistics, 567 
multilevel, 131-134 
multilocation experiment, 592-596 
multilocation study analysis of 
variance, 150 
vs. split-plot design, 499 
strip-plot designs, 135-141 
Repeated measures experiments 
analysis, 499-534 

complex comfort experiment, 510-515 

covariance structures, 556 

drugs effect on heart rate, 502-509 

family attitudes, 516-518 

ideal conditions not satisfied, 535-572 

layout, 500 

MANOVA methods, 537-546 
maximum likelihood method, 555 
mixed model methods, 553-567 
model specifications and ideal 
conditions, 500-501 
p-value adjustment methods, 547-552 
restricted maximum likelihood 
method, 555-567 
SAS-mixed procedure, 519-529 
split-plot in time analyses, 502-518 
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Repeated option 

covariance structure, 559 
drug heart rate study, 559 
SAS-GLM code, 552 
SAS-mixed code, 559 

sorghum varieties and fertilizer levels, 552 
Replication, 71 
vs. subsample, 77 

Residual maximum likelihood. See Restricted 
maximum likelihood estimates (REML) 
Residuals, 28 
Response over time, 515 
Restricted maximum likelihood estimates 
(REML), 309, 326,556 
analysis, 405, 411 

assembly line workers, 374, 375, 381 
assembly line workers covariance 
parameter, 367 
coffee prices example, 639 
confidence intervals, 378 
covariance, 405 

difference between machines, 414 
estimates, 392-393,403,411,413 
insect damage in wheat, 349 
JMP, 331,333 

machine person example, 402 
mean for machine, 413 
mixed models, 392-393 
nitrogen by irrigation example, 479 
proc mixed code to compute, 324 
proc mixed code to obtain, 324 
repeated measures experiments, 555-567 
SAS-mixed code, 349,350,403,411 
Satterthwaite type confidence intervals, 360 
solutions, 348 

variance components, 403, 411 
wheat yield data, 428 
worker data, 361 
Restricted model, 13 
Row and column blocks example, 97-98 
analysis of variance, 98 
designing experiments, 97 
Rows 

error computed, 474 
marginal mean, 201 

S 

Sampling distributions, 23 
SAS 

comfort study, 524 

crossover designs, 613 

family attitude study, 527-529, 587 


multicenter drug experiment, 594 
multiple comparison problems, 54 
pairwise comparisons, 52 
type I sums of squares, 303 
type II estimable functions, 224 
type II sums of squares, 303 
types I—III sums of squares, 305, 316-317 
unweighted means, 237 
SAS-GLM 
analysis, 249 
code, 35 
commands, 250 
contrast statements, 283 
crossover designs, 604, 605 
estimable functions identification, 213 
grade point average study, 178 
MANOVA, 545 
model statement, 215 
nutrition scores example, 283 
procedure, 213, 228, 249 
repeated option, 552 
SOLUTION option, 215, 216 
three-way treatment structures, 277-281 
type I analysis and type III analysis, 228 
type I sums of squares, 303 
type I estimable functions, 223 
SAS mixed code, 51 

analysis of variance, 487 

animal comparisons, 454 

comfort study, 579 

crossover designs, 606 

estimate statements, 442 

final regression model, 448 

fixed effects, 568 

full regression model, 446 

HRT_RATE, 520 

linear and quadratic trends, 442 

linearly independent comparisons, 56 

loaf volume, 426 

nitrogen by irrigation example, 478, 479 

procedure, 36 

random subject effect, 567 

reduced regression model, 447 

REML, 349, 350, 403,411 

repeated measures experiments, 519-529 

shear force data, 454 

split-plot designs, 575 

strip-split-plot design, 482,487 

three-way interaction term, 132 

type III sums of squares, 343 

unequal variance model, 37 

using options, 348 

worker data, 357, 361 
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SAS-MULTTEST, 45, 51 
Satterthwaite approximation, 30, 342, 343 
assembly line workers, 373, 375 
comfort study, 514 

degrees of freedom, 33, 343,509,514,519 
drug heart rate study, 504, 509 
machine person example, 404 
moisture and fertilizer example, 444 
one-way treatment structure, 33 
sorghum varieties and fertilizer 
levels, 555 

strip-plot designs, 477 
type I sums of squares, 377 
variance components inferences 
methods, 348 

Satterthwaite confidence intervals, 362, 378 
assembly line workers, 374 
degrees of freedom, 376 
machine person example, 412 
REML solution, 360 
variance components, 350 
Scheffe's method, 45,49,51, 53, 55, 60,195 
simultaneous inference procedures, 48 
widths of confidence intervals, 57 
Schwarz's Bayesian Criterion (BIC), 557, 558, 
567,588,593 

covariance structure, 565, 566, 576, 577 
Semiconductor industry, 486-493 
Sequential analysis. See Type I analysis 
Sequential rejective methods, 57-58 
Shear force data, 453 
SAS-mixed code, 454 
split-split-plot model, 454 
Sidak-Holm method, 45 

simultaneous inference procedures, 58 
Sidak method, 51, 55 
contrast, 53, 60 

simultaneous inference procedures, 51 
Simple linear regression model, 156 
Simultaneous inference procedures, 50 
Benjamini and Hochberg method to 
control FDR, 58 
Bonferroni-Holm method, 58 
Bonferroni's method, 48 
caution, 68 

comparing with control example, 54 
Duncan's new multiple range method, 64 
Dunnett's procedure, 53 
error rates, 44 
Fisher's LSD procedure, 47 
least significant difference, 46 
linearly dependent comparisons 
example, 59-60 


linearly independent comparisons, 56 
multiple comparisons, 43-70 
multiple range for pairwise comparisons 
example, 66-67 
multiple range tests, 61-63 
multivariate f, 55 

pairwise comparisons example, 51-52 
proc MULTITEST code, 59 
recommendations, 45 
Scheffe's procedure, 48 
sequential rejective methods, 57-58 
Sidak-Holm method, 58 
Sidak procedure, 51 

Student-Newman-Keul's method, 61-63 
Tukey-Kramer method, 49 
Waller-Duncan procedure, 65 
Single-degree-of-freedom tests, 191 
Six-sequence design, 617 
SNK. See Student-Newman-Keul's 
method (SNK) 

Soil study experiment, 596-597 
SOLUTION option, 215, 216 
Sorghum varieties and fertilizer levels, 
540-555 

G-G adjusted degrees of freedom, 552 
leaf area index, 540 
MANOVA, 545, 547, 553 
matrix describing transformed variables, 
546 

split-plot designs, 544, 552 
tests, 553 

variety main effects tests, 551 
Soybeans and herbicides, 448, 450-452,487 
cell means, 452 
soybeans weights, 451 
varieties combinations, 450 
Soybeans in maturity groups example, 
143-145 

analysis of variance, 144 
strip-plot designs, 143-144 
treatment structures, 144 
varieties nested maturity groups, 144 
Sphericity tests, 552 
Split-plot designs, 71 
adjusted p-values, 553 
analysis, 421-470 
analysis of variance, 109,110,145 
analysis of variance table, 115 
bread baking study, 122-124 
bread recipes and baking temperatures, 
422-425 

cheese-making experiment, 131-134 
comfort experiment, 575 
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Split-plot designs ( Continued) 
complex, 130 
concepts, 422 

CR whole-plot structure, 113 
cupcake experiment, 104 
cupcake treatment structure, 109 
errors regressions, 444-447 
error term and variance source, 113 
exercises, 467-470 
family attitude study, 584 
general contrasts comparison, 436-439 
hierarchical designs, 627 
JMP computations, 459-465 
means comparisons standard errors, 
430-433 

means standard errors of differences 
computing, 434-435 
mixed-up split-plot design, 448-450 
model definition and parameter 
estimation, 428-429 
moisture and fertilizer, 440-443 
oven design and treatment structures, 106 
oven treatment structure, 109 
randomization scheme, 105 
randomized complete block 
whole-plot, 109 
RCB whole-plot, 113 
repeated measures, 627 
vs. repeated measures design, 499 
and repeated measures designs, 287 
repeated measures experiments, 502-518 
sample size and power, 455-458 
SAS commands, 584 
SAS-GLM procedure, 550 
SAS-mixed code, 575 
sorghum varieties and fertilizer 
levels, 544,551,552, 553 
soybean varieties nested within 
maturity groups, 145 
split-split-plot design, 452-454 
and strip-plot designs, 115-130 
teaching method study, 115 
time analysis, 502-518, 544, 550-553, 

575, 584 
variation, 108 
wheat varieties, 426-427 
whole plot errors, 429 
Split-split-plot design 
shear force data, 454 
split-plot designs, 452-454 
SSCOLUMNS, 301 
SSERROR, 301 
SSINTERACTION, 301 


SSROWS, 301 

Standard error of comparison, 513 
Statistical analysis objectives, 2 
Steel plates example, 90-92 
designing experiments, 90-91 
Stepdown Bonferroni contrast, 60 
Stepdown Sidak contrast, 60 
Strip-plot designs, 71, 76,101-154 
aircraft engines, 145-146 
analysis, 471-498 

analysis of variance. 111, 112,473,474,486 
animal genetics, 143 
baking bread, 122-124 
bread baking study, 122-124 
change-over designs, 141 
cheese making, 131-134 
comfort study, 139-140 
comparisons, estimators, and 
variance, 476 

cooking beans experiment 
example, 116-119 
cupcake experiment, 110, 111 
description, 471-474 
error term, 113 

estimated standard errors, 477 
experimental unit size identification, 
101-113 

grinding wheat, 120-121 
hierarchical design, 114 
horse feet, 136-138 
inferences, 475-476 
JMP, 486-490 

meat in display case, 125-130 
multilocation study with repeated 
measures, 148-149 
nested factors, 142-150 
nitrogen by irrigation, 477-479, 486 
repeated measures designs, 135-141 
Satterthwaite approximation, 477 
simple comfort experiment, 147 
soybeans in maturity groups, 143-144 
split-plot 1, 480-481 
split-plot 2, 482 
split-plot 3,483-484 
split-plot 4, 485 

split-plot design structures, 115-130 
trial, 76 

two-way treatment structure, 474 
variance source, 113 
Strip-split-plot design 

analysis of variance, 484, 485, 489 
data, 482 

nitrogen by irrigation example, 484,487 
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SAS-mixed code, 482 
SAS-mixed code and analysis 
of variance, 487 
semiconductor industry, 489 
Studentized range critical point, 64 
Student-Newman-Keul's method (SNK), 
64,66 

multiple comparison procedure, 67 
simultaneous inference procedures, 61-63 
Subsample vs. replication, 77 
Summation notation, 155 
Sum of squares 

computing methods, 305 
deviations, 9 
generating, 222 

Hartley's method of synthesis, 300 
null hypothesis deviations, 172 
testing, 9 

Sum-to-zero restriction, 168 


Tasks and pulse rate example, 5-16 
analysis of variance table, 15 
equality of means, 15 
one-way treatment structure, 5-6,15 
proc GLM code, 16 
proc IML code, 16 
pulsation data, 6 
Taylor's series approach, 351 
first-order approximation, 34 
Testing hypothesis 
age, 283 

variance components techniques, 337 
Three by four experiment 

orthogonal contrast coefficients, 193 
orthogonal polynomials expected 
values, 194 

Three-factor experiments, 272 
Three step process, 488 
Three-way, two-level, nested system, 289 
Three-way experiments, 274 
Three-way factorial arrangement, 87 
Three-way interaction term, 132 
Three-way least squares means, 280 
comfort study, 580 
Three-way random effect, 339-343 
Three-way treatment structures, 386 
analysis, 271-276 
balanced and unbalanced 
experiments, 273 
complete analysis, 282-283 
horse foot experiment, 137 


missing treatment combinations 
case study, 277-286 
nutrition scores example, 277 
SAS-GLM analysis, 277-281 
strategy, 271-272 
type I and II analyses, 273 
Treatment combinations 
case study, 265-270 
horse feet experiment, 138 
nitrogen and potassium levels 
example, 93 

two-way treatment structures, 

265-270 

Treatment structure, 71 

absolute deviations values, 27 
comparison methods, 141 
diets, 83 

diets example, 87 
nitrogen and potassium levels, 92 
two-way factorial arrangement, 87 
Tukey-Kramer method, 51, 65 
comparison, 454 
critical differences, 53 
HSD procedure, 66 

simultaneous inference procedures, 49 
Tukey's method, 45, 456 
animal comparisons, 455 
comfort study, 638 
HSD option, 465 

least squares means for three way 
interactions, 465 
multiple comparisons, 638 
Two-level system, 289 
Two-period two-treatment crossover 
design, 142 
Two-way design 

cell mean parameters, 246 
estimating variance components 
methods, 315-317 
fat x surfactant combinations, 265 
having subclass numbers, 231-238 
missing treatment combinations, 246 
Two-way effects model 
estimable functions, 170 
matrix form, 159 
normal equations, 163,166 
type I and type II sums of squares, 217 
type I hypotheses, 219 
Two-way factorial arrangement, 86 
treatment structure, 87 
Two-way least squares means, 252 
comfort study, 580 
family attitude study, 591 
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Two-way means model 
family attitude study, 528 
normal equations, 163 
Two-way mixed model, 390 
Two-way nested treatment structure, 146 
Two-way random effects model, 315 
analysis of variance, 339 
example, 301 
matrix form, 301 
TS in CR DS, 338 
variance components inferences 
methods, 338 

Two-way treatment structure model, 

81, 86,98 

additive levels, 91 

analysis of variance, 89, 90, 92, 94, 

103,104,109,110,112 
ANOVA table, 251 
blocks, 95 

computer analyses, 249-252, 262 
connected and unconnected, 171 
contrast statements, 251 
CRD structure data, 159 
cupcakes, 102 
estimable functions, 251 
example, 247-248 
exercises, 253, 263 
GPA data, 176 

grade point average data, 175 
hypothesis testing and confidence 
intervals, 247-248 
matrix form of model, 158-160 
means, 158-160 

missing treatment combinations 
case study, 265-270 

missing treatment combinations effects 
model analysis, 255-263 
missing treatment combinations means 
model analysis, 245-253 
model statement solution options, 252 
parameter estimation, 245-246 
population marginal means and least 
squares means, 261 
randomization, 102,103 
strip-plot designs, 474 
12 treatment combinations, 81 
type I and II hypothesis, 255-256 
type III hypothesis, 257-258 
type IV hypothesis, 259-260 
Type I analysis, 213, 409 

analysis of variance, 210, 213 
assembly line workers, 367, 371, 377 


assembly line workers covariance 
parameter, 367 

difference between machines, 414 
estimates of mean for machine, 413 
estimates of variance components, 411 
expected means squares and error 
terms, 344 
hypotheses, 219 
machine person example, 409 
test statistic for interaction, 213 
three-way treatment structures, 273 
Type I hypothesis 

assembly line workers, 369 
two-way treatment structures, 255-256 
Type I sums of squares 
analysis of variance, 342 
assembly line workers, 375 
coffee prices example, 633 
definitions, 217 
each column, 304 
proc mixed code, 342 
SAS code to evaluate, 303 
of SAS-GLM, 303 
SAS-mixed code, 343 
sums, 304 

Type II analysis, 218 

assembly line workers covariance 
parameter, 367 

difference between machines, 414 
estimable functions, 225 
estimates of mean for machine, 413 
estimates of variance components, 411 
hypotheses, 219, 220 
means model hypotheses, 218 
three-way treatment structures, 273 
Type II hypothesis, 255-256 
Type II sums of squares 
definitions, 217 
each column, 304 
SAS code to evaluate, 303 
SAS-GLM, 303 
sums, 304 
Type II tests, 525 
Type III analysis, 220, 221 

assembly line workers, 368, 371 
assembly line workers covariance 
parameter, 367 

covariance parameter estimates, 405 
crossover designs, 605 
difference between machines, 414 
estimable functions, 225 
estimates of mean for machine, 413 
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estimates of variance components, 

403, 411 

machine person example, 405 
means model, 258 
Type III hypothesis 

effects models, 221,258 
two-way treatment structures, 257-258 
Type III sums of squares 
analysis of variance, 403 
assembly line workers, 373 
each column, 304 
machine person example, 403 
nitrogen by irrigation example, 478 
SAS code to evaluate, 303 
of SAS-GLM, 303 
SAS-mixed code, 343 
sums, 304 

wheat yield data, 428 
Type III tests 

assembly line workers, 370, 373 
comfort study, 578 
crossover designs, 608, 615 
crossover designs six sequences, 620 
family attitude study, 528, 590 
Type IV analysis of variance, 261 
nutrition scores example, 278 
Type IV hypothesis 

fat-surfactant example, 266, 267 
nutrition scores example, 282 
SAS-GLM tests, 259,279 
two-way treatment structures, 259-260 
Type IV sums of squares, 303 

U 

Unbalanced incomplete block design 
structures, 95 

Unbalanced one-way design, 327-328 
Unbalanced one-way model, 309-312 
Unbalanced two-way, 347 
analysis of variance, 201, 204 
cell means and marginal means, 201 
data set JMP analysis, 415-416 
mixed models, 409-414, 415-416 
Unbounded variance components, 333 
Unconnected incomplete block design 
structure, 96 

Unequal variance model, 22 
drug group means, 38 
fitting results, 37 
pair wise comparisons, 38 
SAS-mixed code, 37 


Unrestricted model, 13 
Unstructured case, 556 
estimated R matrix, 561 
Unweighted means 

analysis of variance, 233, 236, 237 
method, 231 
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